Plagiarism detection – is technology the solution?

I went to a lunchtime seminar by John Barrie on Turnitin, the plagiarism detection software from iParadigms. I can see the practical merits of Turnitin and I can see that it is scalable into the near future, but I wonder if we have actually identified the correct problem to solve, and whether the Turnitin approach is scalable or even sensible into the future? The idea of the entire internet being fingerprinted is reminiscent of the scenario of enough monkeys and keyboards to produce Mozart … at what level is anything truly original?

The title of the presentation was “Vetting academic work for originality: Saving the world from unoriginality” – very catchy for sure, but perhaps not particularly interesting or realistic. At an undergraduate level, the content of most submitted work is not primarily focused on originality, but on accuracy. When writing a first year psych lab report on “The Stroop Effect”, perhaps there is a real limit to the number of ways of expressing the content before the information actually becomes incorrect in pursuit of originality. Maybe instead of detecting plagiarism, we should be trying to generate assessment tasks which are not affected by plagiarism – rather than have one academic grade one thousand papers, perhaps we would do better to have one academic produce ten papers of different quality on the one topic and have 1000 students grade those 10 papers.

Alternatively, if university staff / student ratios were appropriate so that proper assessment of individual undergraduate students could take place (eg presenting a paper to a tutorial group and then submitting a written version for marking, and having shared marking across tutorials), there would be a disincentive to cheat. The thing that would alert teaching staff to plagiarism would be a mismatch between the ability to present the content orally and in written form. If end-of-semester assessment was by essay style (hand-marked) exams requiring generative capabilities, there would be more opportunity to match student voice with their written output.

At the undergraduate level, there is a serious question to be asked about whether it is more important to be able to generate an original piece of work than to recognise which piece of work most accurately reflects “the right answer” (assuming there is such a thing)? If we can string together appropriate pieces of information wherever they come from to produce a coherent article (be it a “term paper”, an essay, a lab report, a computer program), we at least are showing that we understand the content area appropriately. I believe this is a necessary but not sufficient precursor to being able to produce something original. I actually have grave doubts as to whether true originality at the undergraduate level would be recognised by the average tutor, let alone encouraged or rewarded. It requires substantial academic expertise to evaluate the quality of original work in a discipline area.

It seems that plagiarism is considered a serious issue because we like to claim that a prime objective in teriary teaching is to instill in our students the concept of academic integrity and of scholarship. However, to my mind, academic integrity (coupled with academic freedom) is associated with a whole moral philosophy regarding knowledge, sharing of knowledge and how academic work contributes to the greater good of human endeavour. Academic values and moral philosophy are taught by example (intellectual and behavioural modelling) rather than by policing. If plagiarism is rampant in the younger generation, we should be looking to the values implicit in our education system rather than to policing strategies to effect cultural change.

If you look at the highly structured curriculum favoured by our secondary education sector and the templated way much of the “knowledge” is presented, it is not surprising that plagiarism is rampant – what is the difference between plagiarising and rote-learning? What is the difference between a “fact” and an “idea” and do facts as well as ideas have citable sources?

In terms of values and behavioural modelling, if you also look at business ethics (or attitudes to speed cameras) in the past 15 years, the emergent theme is that anything that is not expressly forbidden is implicitly allowed. No matter what the written rules say, if it isn’t policed, you’re allowed to do it. And if you’ve got away with doing it for a while, it violates your rights to suddenly start policing it. Steve Vizard and Rene Rivken come to mind on the business front … what did they do that was wrong???

Intellectual property and copyright law seems not to be about integrity and moral philosophy at all, but are much more about how to protect the ability of an individual or institution to make money from creative endeavours rather than to share that creative output with the rest of the community (which in the past funded academic institutions to pursue the creation of new knowledge for the greater good of humanity).

Two other factors which have affected academic integrity in a subtle but seriously insidious way are mentioned in passing below. Both of them affect the behaviour of academics which is then modelled by those who are learning from them creating a different academic culture and set of standards.

1) Measuring research output by number of publications rather than quality of publications (counting is easier than assessing quality) so that there is a strong career incentive to make as much publication mileage as possible out of each random academic idea no matter whether it leads to institutionally-endorsed rampant self-plagiarism, a proliferation of poor-quality journals, and/or a sense of dissatisfaction with the entire peer-review and publishing system.

2) A strong push to “reusable content”, without ever clarifying the difference between acceptable / appropriate reuse and plagiarism – acknowledgement of the source is an obvious difference to a trained academic, but the fine line between paraphrasing and substituting synonyms is a tougher call to make for a layperson. Maybe, in the end, the only difference is the wider vocabulary available to most academics – an academic’s lexicon already contains the synonyms that a layperson searches for in a thesaurus, but the paraphrasing process is still the same – when does restating an idea “in your own words” become stating something original? And do I have to cite that Tom asked this question of me in the corridor tonight or can you believe that I thought of it first? And if I did, have I now “beaten Tom to press” so that he will have to cite me in the future?

Back to reuse of content, consider particularly the concept of reuse and acknowledgement of source in the teaching context (which is often the only context in which students see academics at their work). Clearly the ideas being presented in the classroom are not original because we are teaching people about the current state of agreed-upon knowledege in a field.

Lectures and visual aids associated with lectures provide a context for assigned reading and other research activities. Often, lectures provide a specific context or elaboration on material sourced from “the textbook”. If you now consider how the process of generating lecture resources for a “traditional lecture” has changed during my 20 years as an academic

– (circa 1985) I gathered together a set of slides or overheads illustrating key points, and wrote key points on the blackboard

– (circa 1990) I prepared overhead transparencies with illustrations and key points

– (circa 1995) I prepared Powerpoint presentations which were distributed via an intranet

– (circa 2000) I prepared Powerpoint presentations which were placed on the web

By the early 2000s, in common parlance, the Powerpoint presentation became “The Lecture”, and because it resided in a public place free of the context in which it was presented and the words which were uttered explaining the origin and content of each idea and image, issues of copyright and intellectual property started to arise. The overheads of annotating each idea and image became a disincentive to preparing interesting additional resources for teaching, and the idea that providing enrichment to one group of students but not to all students (where different staff taught different streams) undermined the sense of academic responsibility for teaching material as well as undermining the atmosphere of collegiality.

It seems to me that institutions have only recently become deeply interested in the issue of plagiarism detection in the context of selling curriculum, selling degrees, selling research output and gaining competitive advantage from the intellectual property of their workforce of academics. The sense of academic integrity and moral philosophy associated with being part of an international community of scholars whose combined knowledge belongs to humanity has been seriously eroded by treating academic output as a saleable commodity and applying “business models” to academia using totally inadequate analogies.

I guess one aspect of writing in a blog that is simultaneously a real strength and a serious weakness is about to be demonstrated – I want to post this now because I know I won’t come back to it properly in the next few weeks to fill out the gaping holes in the line of argument. I think I know how to fill them, but I don’t have the time right now. Is it better to put the half baked idea “out there” (even if I’m the only person who goes back to read it) or is it best to let it drown in a sea of other half baked ideas? And furthermore, is this enough to ensure that I at least mark a line in the sand to say “I thought like this on this day, even if I don’t get to rethink it and publish it properly until a lot later on …”

Blogging at work

I have had a few attempts at running a blog “for work” and each time I have hit a bit of a brick wall. There has been a lot written recently on blogging, what it is about, and whether it has an important role in a formal teaching-and-learning context. I have been stimulated to update this blog via a web-forum email asking about blogging at UniMelb …

My current thoughts re blogging as a genre of writing:

1. Blogging software provides an easy information architecture for “episodic writing” … especially of things that are loosely topic based, but become “topical” at a particular time for reasons that are not easily encapsulated, and are likely to be relevant again at a later date

2. Blogging tools are only useful for people who write prolifically, have regular access to the internet, do most of their writing at a computer rather than in a notebook and are comfortable with public scrutiny of their writing.

3. Blogging is essentially personal even when it’s work-related. I write to a blog as a convenient place to store ideas that are forming so that I can edit them from anywhere and I can refer to them easily if the ideas come up in conversation. I write to a publically-accessible blog to challenge myself to write more coherently than I would in a notebook – I operate from the premise that articulating an idea clearly is part of the process of thinking clearly, and that if I can’t express what I mean then I don’t actually know what I’m talking about yet. Feedback is always good when clarifying ideas.

4. Blogging has inherent dangers in the workplace – point 3 identifies that I am blogging ideas that are not necessarily fully formed. So a blog entry is a bit like a draft of an idea, or a “Dear Diary” type letter. There is a reason for drafting things and often it is because partially formed ideas that escape before their possible endpoints have been fully thought through can be dangerous. So I censor much of what I write to a blog. And because I do this, my blog ends up with very few entries and those that are written are not particularly interesting.

5. Blogging as a writing genre relies on students having a desire to write. Use of blogging software has some merit in a range of situations irrespective of whether the genre of writing is “true blogging ” (according to the blogging gurus in favour in any particular week …) My take on this is the most academics I have worked with are only just getting comfortable with discussion forums, and that blogging and RSS is beyond their comfort zone to use and support.

Re blogging software:

1. My first impediment to work-related blogging was lack of infrastructure and lack of server to install blogging software.

2. I tried using Bloki (http://www.bloki.com) for a while and it’s a pretty nice combination of blogs, forums and wiki-like website. I specifically used it to store information about web resources I happened to come across with annotations about what they were and why they looked interesting. My idea was that people with similar research interests would be able to follow what I was looking at on my blog, and might be inspired to make a similar resource available of their own reading so that we could share our research lives more effectively … I ended up becoming a bit wary about committing too much work-related stuff to a random server in a random location over which I have no control. I have to say, the site is still there 2 years later and there has been nothing but good service from the site.

3. I then installed MoveableType, PhpWiki and Moodle on my own personal website (http://wisebytes.net/research/blog/) to try them out and because it was the only place I had access to a shell account on a *nix server along with scritping and database services. I set up a research blog to take over from the Bloki site, but never managed to move my Bloki material to MoveableType. I used the PhpWiki quite a bit and liked it although I’ve never been game enough to leave it open to the world, and I never got around to publishing a read-only version of it either.

4. I finally got access to a server at UniMelb and installed blogging and wiki software. I used WordPress rather than MoveableType because it was just at the time where MoveableType introduced a licence fee which I didn’t want to pay. So I was again in a position of moving all my stuff from Bloki to MoveableType to WordPress. I also had great trouble with the authentication module in PhpWiki so that pages kept locking people out of editing them. I got Moodle working which good, and spent a bit of time playing with that too.

5. Having failed to inspire my academic colleagues to have any interest in starting a blog or using a wiki for drafting research papers or documentation and having spent a lot of time trying to get the infrastructure sorted to support wider spread usage of blogs, I actually ended up losing interest in writing blog content since most of it relates to a) things that none of my colleagues seem to find particularly interesting or b) things that are politically sensitive.

6. I have used BlogLines (http://www.bloglines.com/) as an RSS aggregator until I got swamped by the amount of stuff out in the world. I have ended up taking the lazy option of subscribing to Stephen Downes’ OnLineDaily newsletter as my primary source of keeping up with the world of edublogs. RSS has huge potential in teaching and learning but I’m waiting for other people to sort out the tools etc.

The biggest disincentive to maintaining a work-blog is a subtle shift in academic culture such that I am no longer confident that the university supports freedom of expression over corporate image, or substance over process, or content over style.

The biggest disincentive to supporting blogs in teaching and learning is an apparent lack of in-built passion for writing. Maybe moblogs or vlogs or Flickr will take off instead !!!