Myth or Fact? Best paper awards are really the best papers. Sometimes I read these papers and see a whole new problem space. Other times, authors use them to gain credibility and popularity.
more and more journals are using systems that automatically spot self-plagiarism
Consider this: it is actually advantageous to republish your own papers, even though your work is already widely available electronically in the first place. This is totally insane if you think about it!
The simple fact that we need tools to detect self-plagiarism proves that there is a problem. If the system was self-regulating, we wouldn’t need this.
Moreover organizations like ACM and IEEE are enforcing blacklisting.
I’m very interested in these black lists. Is this documented somewhere?
Something is self-regulating if it does the “right thing” on its own, without any supervision. That is, it cannot be gamed for profit.
yarbelsays:
An excellent post. Thanks. I completely agree with the argument and think the problem is actually much worse (considering how past record of publications increases your chances of being published. Most peer reviews, I believe, are far from being fully blind).
However, I completely disagree with the conclusion. It is simply a waste of time to read all the junk that is being published just to get a sense of what’s good and what’s not. There are quite a few substitutes to reading the full paper. We should think along the line of better mechanisms for regulation instead of abandoning it altogether.
My first suggestion would be to limit the amount of allowable citations per paper to (60%*current average). This will force authors to cite only what’s crucial to their issue and will raise the credibility of the citation index.
I do not think that the situation is so bad: more and more journals are using systems that automatically spot self-plagiarism, and also paper submission systems like EDAS have this feature.
Moreover organizations like ACM and IEEE are enforcing blacklisting. I know a guy that was considered to have done some form of self plagiarism submitting a paper to a conference and a variation to a journal, spotted by a reviewer and the result was that both papers were rejected, plus the guy risked to be blacklisted for some time.
It is not quite clear to me what you mean by the phrase “science is self-regulatory”. And once you have spelled that out, is it something you really want? I want a science that explodes when measured by the amount of insight and benefit it generates.
You would need to ensure that the same paper, only slightly modified, keeps the same hash value with high probability. I bet that it would not be very long before someone could reverse engineer the hash function and break the system.
Danielesays:
One can imagine a centralized system to which different conferences and journals can submit a ‘fingerprint’ of each paper. This fingerprint could be some form of perceptual hashing for textual content, in our case the stripped down version of the PDF or PS.
The fingerprint should guarantee the confidentiality of each submission by assuring that it is computationally infeasible to reverse the fingerprint leading to the actual text.
The centralized system could then ‘fuzzily’ match new fingerprints against the database of stored fingerprints and flag paper that are almost identical.
Also, the system could be informed of the dates when the reviewing process for a certain paper begins and ends. This way the system could also spot concurrent submissions.
I can see some problems with this approach, namely the centralized authority would know, after a paper becomes public, how many times a certain paper has been resubmitted. This is thanks to the fact the perceptual hashing will be able to spot similarities. But I am not sure whether this is a bug or a feature. Thoughts?
Danielesays:
I agree that sexurity by obscurity would ne a bad idea and the perceptual hashing function would be public. Current state of the art techniques in perceptual hashing (e.g., for images) are robust to a certain degree of noise. I need to do some research in similar systems for text, but I would bet that they are very good at detecting similarity even in the presence of slight or even large modification of the text. The problem would be very related to plagiarism detection with the added constraint of only keep a limited size perceptual hash rather than the full text.
Danielesays:
Sorry for the typos, I typed the comment from my phone with autocorrection.
And they have also published the additional written evidence by researchers (Volume II), editors… really interesting! http://www.publications.parliament.uk/pa/cm201012/cmselect/cmsctech/856/856vw.pdf
In Volume II there are a few proposal about how to improve the peer review, and alternatives to it.
In my opinion, it is necessary to find some alternative mechanism to it, something that considers a first phase with peer review an then a “continuous social review†that contrast the paper with the “use†that the society is doing with it. But this is difficult.
Beatriz Barros (coordinator of the SISOB project)
Mac Data Recoverysays:
Now that content is used in different ways to gain online presence, it would really be important to follow a strategy that will allow you to identify original works from those that are mere remakes of another.
The paper you cite is only cited once, so why should I trust it?
@Louis
Right. So you can’t trust citation statistics alone to determine the validity of a paper. It is self-evident, but worth repeating.
@yarbel
I don’t recommend anyone starts reading more junk papers.
Your proposal makes sense but it touches a nerve: what does “citing a paper” means?
Myth or Fact? Best paper awards are really the best papers. Sometimes I read these papers and see a whole new problem space. Other times, authors use them to gain credibility and popularity.
more and more journals are using systems that automatically spot self-plagiarism
Consider this: it is actually advantageous to republish your own papers, even though your work is already widely available electronically in the first place. This is totally insane if you think about it!
The simple fact that we need tools to detect self-plagiarism proves that there is a problem. If the system was self-regulating, we wouldn’t need this.
Moreover organizations like ACM and IEEE are enforcing blacklisting.
I’m very interested in these black lists. Is this documented somewhere?
@Tommy
Myth. It is impossible to predict accurately the importance of a given piece of work.
@Gustav
Something is self-regulating if it does the “right thing” on its own, without any supervision. That is, it cannot be gamed for profit.
An excellent post. Thanks. I completely agree with the argument and think the problem is actually much worse (considering how past record of publications increases your chances of being published. Most peer reviews, I believe, are far from being fully blind).
However, I completely disagree with the conclusion. It is simply a waste of time to read all the junk that is being published just to get a sense of what’s good and what’s not. There are quite a few substitutes to reading the full paper. We should think along the line of better mechanisms for regulation instead of abandoning it altogether.
My first suggestion would be to limit the amount of allowable citations per paper to (60%*current average). This will force authors to cite only what’s crucial to their issue and will raise the credibility of the citation index.
I do not think that the situation is so bad: more and more journals are using systems that automatically spot self-plagiarism, and also paper submission systems like EDAS have this feature.
Moreover organizations like ACM and IEEE are enforcing blacklisting. I know a guy that was considered to have done some form of self plagiarism submitting a paper to a conference and a variation to a journal, spotted by a reviewer and the result was that both papers were rejected, plus the guy risked to be blacklisted for some time.
It is not quite clear to me what you mean by the phrase “science is self-regulatory”. And once you have spelled that out, is it something you really want? I want a science that explodes when measured by the amount of insight and benefit it generates.
The simple fact that we need tools to detect self-plagiarism proves that there is a problem. If the system was self-regulating, we wouldn’t need this.
Agree on this.
Moreover organizations like ACM and IEEE are enforcing blacklisting.
I’m very interested in these black lists. Is this documented somewhere?
This is not well documented but here are a couple of links:
http://www.ieee.org/publications_standards/publications/rights/plagiarism/index.html
IEEE ha what they call PAL
http://www.acm.org/publications/policies/plagiarism_policy
ACM is less clear, but the case I mentioned happened in a ACM conference, so I think they have their own list.
@Daniele
You would need to ensure that the same paper, only slightly modified, keeps the same hash value with high probability. I bet that it would not be very long before someone could reverse engineer the hash function and break the system.
One can imagine a centralized system to which different conferences and journals can submit a ‘fingerprint’ of each paper. This fingerprint could be some form of perceptual hashing for textual content, in our case the stripped down version of the PDF or PS.
The fingerprint should guarantee the confidentiality of each submission by assuring that it is computationally infeasible to reverse the fingerprint leading to the actual text.
The centralized system could then ‘fuzzily’ match new fingerprints against the database of stored fingerprints and flag paper that are almost identical.
Also, the system could be informed of the dates when the reviewing process for a certain paper begins and ends. This way the system could also spot concurrent submissions.
I can see some problems with this approach, namely the centralized authority would know, after a paper becomes public, how many times a certain paper has been resubmitted. This is thanks to the fact the perceptual hashing will be able to spot similarities. But I am not sure whether this is a bug or a feature. Thoughts?
I agree that sexurity by obscurity would ne a bad idea and the perceptual hashing function would be public. Current state of the art techniques in perceptual hashing (e.g., for images) are robust to a certain degree of noise. I need to do some research in similar systems for text, but I would bet that they are very good at detecting similarity even in the presence of slight or even large modification of the text. The problem would be very related to plagiarism detection with the added constraint of only keep a limited size perceptual hash rather than the full text.
Sorry for the typos, I typed the comment from my phone with autocorrection.
I really, really hope that the solution is not a “+1 – Like” button for scientific papers…
Good post!
I completely agree.
Regarding with it, recently (July 2011) a report of the “House of Commons†about peer review has been published.
This is the report (Volume I)
http://www.publications.parliament.uk/pa/cm201012/cmselect/cmsctech/856/856.pdf
And they have also published the additional written evidence by researchers (Volume II), editors… really interesting!
http://www.publications.parliament.uk/pa/cm201012/cmselect/cmsctech/856/856vw.pdf
In Volume II there are a few proposal about how to improve the peer review, and alternatives to it.
In my opinion, it is necessary to find some alternative mechanism to it, something that considers a first phase with peer review an then a “continuous social review†that contrast the paper with the “use†that the society is doing with it. But this is difficult.
Beatriz Barros (coordinator of the SISOB project)
Now that content is used in different ways to gain online presence, it would really be important to follow a strategy that will allow you to identify original works from those that are mere remakes of another.