Saturday, December 08, 2007

Ethics, plagiarism, eTBLAST and déjà vu

(It has been an eventful week, with moving apartments and lots of boxes to pack and unpack, but there are some interesting blog posts stored in my mind, and they’ll be up as soon as I get typing)

Abi and other Indian researchers have written extensively about a couple of cases of plagiarism in research from India over the past year or so. There have been two major recent examples of what can only be called blatant and unacceptable cases of plagiarism, which have rightfully been panned by Abi and many others.

But here’s the thing which perhaps hasn’t been discussed enough. Sure, there have been a few researchers who resort to plagiarism or make dubious ethical decisions to steal research or duplicate their own work and present it as different work. But at a larger scale, within the broader education system, there is no understanding or even recognition that this is a problem, nor is there any effort to educate all students about ethics, plagiarism and the importance of citing one’s sources.

The simple concept of citing references should start at a very early stage, in junior school or middle school. I recall numerous science, history and geography projects that we did starting late in junior school. The charts and reports were always something I looked forward doing, and for projects in my 9th and 10th grade, I spent many wonderful hours in the library of the Indian Institute of World culture, looking up their terrific collection of history books. Thankfully, we had been told to cite our references by our history teacher, and I meticulously made a list of all my references and added that to a bibliography. But I remember many more examples from school and even college where students would come up with reports without a single reference, or in other cases copy large sections of text from textbooks or other sources, verbatim. They were never once told that they were wrong, and that was unacceptable. In one instance, for a social studies class project, a teacher proudly announced to the class that one such project was “outstanding” and “beautifully written” (a comment more pertinent to this student’s elegant artwork and layout, but nothing about the content). This project was copied out faithfully from a textbook, which wasn’t even cited in the non-existent reference/bibliography section. In stark contrast, when I moved to the States some years ago, I was pleasantly surprised when I found that most school kids here were required to cite their sources in their class projects and essays. The lackadaisical approach towards ethics in research often continues through college in India, and the student learns of citing resources or ethical issues only when he/she is a graduate student, or in some cases never at all. I’ve hardly been surprised to find some international students here in the States (this malady isn’t restricted to India, but may be widespread across parts of Asia or Africa) who aren’t exactly sure what is acceptable and what is not. There is an urgent need to educate educators in India, and make an ethics and plagiarism course a serious part of the curriculum at least in the freshman/sophomore years of college. (And when I say serious, I mean that if a person “flunks” this course, that person should not be allowed to progress to the next year in college).

Blogging on Peer-Reviewed Research
That said, here’s an interesting second part of this post. A month or so ago, Dr. Harold “Skip” Garner, a researcher here, talked about some of his latest research. He, amongst other things, is a bioinformaticist, and his group came up with the eTBLAST resource. Now, most of us researchers are used to searching for other researchers’ work using databases such as PubMed and SciFinder. What these do is allow us to find research publications or resources typically by topic or author. But eTBLAST is a more sophisticated tool since it is capable of searching large sections of text. So, let us say you put in an entire summary of a research paper, and you want to find more research similar to this entire topic, eTBLAST in its results section spits out a list of researchers/authors who have worked on topics similar to what you have queried, with a listing of the details of the work/publications (if it isn’t all clear to you, give it a try here). This allows the user to do a number of things. It allows the user to identify leading researchers in a certain area, or identify the most appropriate journals for a particular type of research. It is popular now with journal editors or grant funding agencies to identify appropriate reviewers for papers or grants. In short, if used well, this can be an extremely powerful tool for bibliographical data mining.

Skip and his team built eTBLAST primarily for this purpose, but found that eTBLAST had a potentially very useful “side effect”. It turns out that his tool was extremely good at finding duplicate citations.

Yup, this tool is extremely useful in finding published work which are very close replicates of already existing published work.

Skip and his team used a sample of about 60000 citations that they drew from Medline, and used eTBLAST to analyze them. What they found were a couple of dozen cases of citations with no shared authors, i.e. cases which were very likely to have been completely plagiarized. Some of the examples Skip gave were hilarious, with one particular example that had me in splits. There was this researcher in England (and I couldn’t help but thinking that his name seemed suspiciously of subcontinental or middle-eastern origin) who had published a paper, and then decided that this paper was so good that he would publish the entire thing again, practically verbatim, in a different journal. What’s more, not satisfied by this he published this paper yet again, an incredible third time, without changing much more than a few numbers, in a third journal! He must think his work is so good that it need to be published the same way three times. In addition to these examples, there were many hundreds of cases where the same author had published “very similar work” in different journals, without having bothered to change the text, title or references too much.

The utility of eTBLAST was incredibly apparent in this live demonstration that Skip put together. I’m just hoping that along with an increasing awareness of ethics in research, there will be more such tools that not only help research and bibliographical mining, but also can be used effectively to find and expose these “researchers”.

The details of using eTBLAST to find duplicate citations have now been described in an excellent publication in Bioinformatics. Here’s the link to the research paper from Skip’s group in Bioinformatics, titled Déjà vu – A Study of Duplicate Citations in Medline. This second link is to the aptly titled Déjà vu database, a “repository of duplicate citations” from numerous databases.


Fëanor said...

The issue of a lacuna in the teaching of ethics in Indian schools has rather sordid ramifications. Even in top-notch institutions such as the Indian Institute of Science, there is cheating - both at coursework and at examinations. How do I know? I cheated shamelessly, I am afraid, in any course that bored me to tears. Luckily I managed to be done with the boring subjects in the first year, so there was a modicum of honesty reigning in the remainder of my programme there. Some people copied assignments because they couldn't be bothered to do them, while yet others brought crib sheets into the exam halls. It is not that the perpetrators were acting in blissful ignorance: they just didn't care about the rights and wrongs. In this, the behaviour - in our early twenties - was essentially unchanged from middle and high school, where also there was rampant cheating. Some visiting professors from abroad were shocked to see the cavalier attitude the students had. One even said - you guys are among the smartest in the country: why do you need to behave so dishonestly? The brief loss of face that ensued was quickly forgotten, and matters continued pretty much as usual.

Wavefunction said...

eTBLAST seems like a really useful tool. Recently there was a case not of plagiarism of fraud, but one where a researcher thought he had discovered something new, but it turned out that he had rediscovered a 100 year old reaction and produced nothing new. eTBLAST could have helped tell him who else had used that reaction.

Contrary to what people might think, ethics classes can be made very interesting. I had to take a mandatory ethics course in grad school. It was one of the most interesting courses I have ever taken with a lot of case studies thrown in. They even showed a great movie on the beginnings of the HIV epidemic which illustrates many ethical and social dilemmas (I think you will like it. It's called "And the band played on" based on true stories and characters).

In the end, it's about an ethical culture as you noted. You may know about this recent Turkish physics scandal where more than 25 papers were discovered to be copied or plagiarised by almost 20 Turkish researchers. But then an anonymous Turkish student told me that this is not very uncommon in some places there, where it's considered ok to "borrow" from some other research to advance your career. In fact sometimes they would invite another researcher to their university and strike a "deal" with him to put their names on his papers in return for a generosity during his stay. Clearly things need to be drilled in culture itself.

Sunil said...

feanor, thanks for that comment. Yes, ethics and plagiarism goes deeper than just in research. But I think the problem of basic ethics, starting with this kind of behavior in school and college, is almost "unfixable" in the present Indian context. But at least within the research community (which in India is pretty tiny), it can be fixed easily and quickly. There are just a few hundred research groups, and all students can be made to go through an ethics course where the basic concepts are at least taught. Secondly, the "review committees" that investigate fraud must be given more teeth, and they should be able to penalize perpetrators. Unfortunately, they are lax in their duties, or sometimes even some "big names" get caught up in plagiarism or fraud. Sad, isn't it.

Ashutosh, eTBLAST is terrific, and I've made it part of my research tools. And you're absolutely right, ethics classes can be interesting. I've taken a few both in Seattle and here in Dallas, and they've all actually been a lot of fun. In fact, I was even required to do a short ethics presentation for my own research group meeting, and we all had a lot of fun discussing some casestudies.

And as far as the culture goes, you're absolutely right again. A "chalta hain" attitude isn't going to get India very far. And some people even come up with terrible excuses like "western culture is different from eastern culture". People who come up with those remarks should be flogged :-)

Anonymous said...

Once I played silkroad, I did not know how to get strong, someone told me that you must have silkroad gold. He gave me some sro gold, he said that I could buy silkroad online gold, but I did not have money, then I played it all my spare time. From then on, I got some silk road gold, if I did not continue to play it, I can sell cheap silkroad gold to anyone who want.

Plagiat Finder said...

Online plagiarism finder - free duplicate content checker.

Anonymous said...

I can recommend you one tool that I use all the time, plagiarism checking service - Their service uses a vast amount of resources in plagiarism checking and offers up to 10 papers check for FREE.