(It has been an eventful week, with moving apartments and lots of boxes to pack and unpack, but there are some interesting blog posts stored in my mind, and they’ll be up as soon as I get typing)
Abi and other Indian researchers have written extensively about a couple of cases of plagiarism in research from India over the past year or so. There have been two major recent examples of what can only be called blatant and unacceptable cases of plagiarism, which have rightfully been panned by Abi and many others.
But here’s the thing which perhaps hasn’t been discussed enough. Sure, there have been a few researchers who resort to plagiarism or make dubious ethical decisions to steal research or duplicate their own work and present it as different work. But at a larger scale, within the broader education system, there is no understanding or even recognition that this is a problem, nor is there any effort to educate all students about ethics, plagiarism and the importance of citing one’s sources.
The simple concept of citing references should start at a very early stage, in junior school or middle school. I recall numerous science, history and geography projects that we did starting late in junior school. The charts and reports were always something I looked forward doing, and for projects in my 9th and 10th grade, I spent many wonderful hours in the library of the Indian Institute of World culture, looking up their terrific collection of history books. Thankfully, we had been told to cite our references by our history teacher, and I meticulously made a list of all my references and added that to a bibliography. But I remember many more examples from school and even college where students would come up with reports without a single reference, or in other cases copy large sections of text from textbooks or other sources, verbatim. They were never once told that they were wrong, and that was unacceptable. In one instance, for a social studies class project, a teacher proudly announced to the class that one such project was “outstanding” and “beautifully written” (a comment more pertinent to this student’s elegant artwork and layout, but nothing about the content). This project was copied out faithfully from a textbook, which wasn’t even cited in the non-existent reference/bibliography section. In stark contrast, when I moved to the States some years ago, I was pleasantly surprised when I found that most school kids here were required to cite their sources in their class projects and essays. The lackadaisical approach towards ethics in research often continues through college in India, and the student learns of citing resources or ethical issues only when he/she is a graduate student, or in some cases never at all. I’ve hardly been surprised to find some international students here in the States (this malady isn’t restricted to India, but may be widespread across parts of Asia or Africa) who aren’t exactly sure what is acceptable and what is not. There is an urgent need to educate educators in India, and make an ethics and plagiarism course a serious part of the curriculum at least in the freshman/sophomore years of college. (And when I say serious, I mean that if a person “flunks” this course, that person should not be allowed to progress to the next year in college).
That said, here’s an interesting second part of this post. A month or so ago, Dr. Harold “Skip” Garner, a researcher here, talked about some of his latest research. He, amongst other things, is a bioinformaticist, and his group came up with the eTBLAST resource. Now, most of us researchers are used to searching for other researchers’ work using databases such as PubMed and SciFinder. What these do is allow us to find research publications or resources typically by topic or author. But eTBLAST is a more sophisticated tool since it is capable of searching large sections of text. So, let us say you put in an entire summary of a research paper, and you want to find more research similar to this entire topic, eTBLAST in its results section spits out a list of researchers/authors who have worked on topics similar to what you have queried, with a listing of the details of the work/publications (if it isn’t all clear to you, give it a try here). This allows the user to do a number of things. It allows the user to identify leading researchers in a certain area, or identify the most appropriate journals for a particular type of research. It is popular now with journal editors or grant funding agencies to identify appropriate reviewers for papers or grants. In short, if used well, this can be an extremely powerful tool for bibliographical data mining.
Skip and his team built eTBLAST primarily for this purpose, but found that eTBLAST had a potentially very useful “side effect”. It turns out that his tool was extremely good at finding duplicate citations.
Yup, this tool is extremely useful in finding published work which are very close replicates of already existing published work.
Skip and his team used a sample of about 60000 citations that they drew from Medline, and used eTBLAST to analyze them. What they found were a couple of dozen cases of citations with no shared authors, i.e. cases which were very likely to have been completely plagiarized. Some of the examples Skip gave were hilarious, with one particular example that had me in splits. There was this researcher in England (and I couldn’t help but thinking that his name seemed suspiciously of subcontinental or middle-eastern origin) who had published a paper, and then decided that this paper was so good that he would publish the entire thing again, practically verbatim, in a different journal. What’s more, not satisfied by this he published this paper yet again, an incredible third time, without changing much more than a few numbers, in a third journal! He must think his work is so good that it need to be published the same way three times. In addition to these examples, there were many hundreds of cases where the same author had published “very similar work” in different journals, without having bothered to change the text, title or references too much.
The utility of eTBLAST was incredibly apparent in this live demonstration that Skip put together. I’m just hoping that along with an increasing awareness of ethics in research, there will be more such tools that not only help research and bibliographical mining, but also can be used effectively to find and expose these “researchers”.
The details of using eTBLAST to find duplicate citations have now been described in an excellent publication in Bioinformatics. Here’s the link to the research paper from Skip’s group in Bioinformatics, titled Déjà vu – A Study of Duplicate Citations in Medline. This second link is to the aptly titled Déjà vu database, a “repository of duplicate citations” from numerous databases.