To be clear, as the article says, these authors were offered a choice and agreed to be on the "no LLMs allowed" policy.
And detection was not done with some snake oil "AI detector" but by invisible prompt injection in the paper pdf, instructing LLMs to put TWO long phrases into the review. They then detected LLM use through checking if both phrases appear in the review.
This did not detect grammar checks and touchups of an independently written review. The phrases would only get included if the reviewer fed the pdf to the LLM in clear violation to their chosen policy.
> After a selection process, in which reviewers got to choose which policy they would like to operate under, they were assigned to either Policy A or Policy B. In the end, based on author demands and reviewer signups, the only reviewers who were assigned to Policy A (no LLMs) were those who explicitly selected “Policy A” or “I am okay with either [Policy] A or B.” To be clear, no reviewer who strongly preferred Policy B was assigned to Policy A.
I'm not sure what experience anyone in this thread has with grad level research as a student/author, but I can assure you that heads roll over this kind of thing.
A professor's career is built on reputation, and that reputation is as strong as their students' (who do much of the "work" such as it is). It comes down to the professor, but this can be a career-ending moment for those students and I'm quite confident there were some very uncomfortable discussions as a result of this.
Depends on the field. One of the most influential papers in economics was found to be incorrectly constructed with signs pointing to just straight up fraud. Basically it didn't include data that it said it did, which when included reverses the conclusion. Then when the authors were called out, they doubled down offering up the explanation that the conclusion again reverses if you add a third set of cherry picked data, followed by dragging the person calling them out through the mud in a NY Times opinion piece.
I consider LLMs to be a very useful tool and use them every day. But if I sign a slip of paper saying I won't use them for some project, and then use them anyway, not merely using them but copying without even the pretense of putting it into my own words, then that's fraud. LLMs being a tool is completely orthogonal to this fraud.
This comment doesn't seem to fit the discussion at all?
The discussion is not about humans using LLLs to write papers. It is about humans who agreed not to use LLVM in reviewing papers, then did exactly that.
There's a lot of irony in a defensive comment being written based on misreading / inattentive reading of a post about reviewing papers (requiring attentive reading).
It might be that paper authors required others not to use LLMs for reviewing their work. Then, by the rule of reciprocity, they shouldn't use LLMs for reviewing others work. The article is unclear on whether this implied reciprocity rule was explicitly stated or not.
In addition to being a reviewer, they also submitted their own research to this journal. So it leads to the question: if they were willing to cheat on the side of review with less incentive, why wouldn’t they cheat on the side that provides more incentives?
(Meaning, your career doesn’t get boosted much for reviewing papers, but much more so for publishing papers)
A hammer can be used to build a house, or to kill a person. We have a lot of history, law, and culture (likely more), around using tools like hammers so that we know what is good use vs what is bad. The above applies for many others tools as well.
LLMs can be very useful tools. However we also know there are a lot of bad uses and we are still trying to figure out where there are problems and where there are none.
Reading the article is exhausting. If I can leave a comment just as well without reading the article, then what's the problem? If I got something wrong, other people will point it out. That's a more efficient use of my time.
I was thinking this too, but I don't believe this is the case, and I feel like it would not be a good idea either.
Most of these people are likely students; this should be a learning moment, but I don't think it is yet grounds for their entire academic career to be crippled by being unable to publish in a top-tier ML venue.
If this is tolerated, it sends exactly the wrong kind of message. The students, if they are, should be banned for life. Let them serve as an example for myriads of future students, this will be a better outcome in the long run.
This didn't trip for people who were merely bouncing ideas off a LLM, they caught people who copy and pasted straight from their LLM.
It's not a fully consensus view, but a majority of sociologists agree that high severity deterrence has limited effectiveness against crime. Instead, certainty of enforcement is the most salient factor.
Correct. We also have evidence both from cheating in sports and in academia that stiff punishments do not work. Many people hold the false belief that if it is easy to cheat then the punishments must be extremely severe to scare would be cheaters. It just does not work. Preventing cheating is way easier said than done.
> We also have evidence both from cheating in sports and in academia that stiff punishments do not work.
Maybe so, but there is evidence that lack of punishment also don't work.
Neither extreme "works". Just because terminal punishments do not prevent the worst cheating does not in any way imply that slap on the wrists reduce incidents of cheating.
Are you claiming that one of the extremes "works"? That the "light punishment" route reduces cheating? Or maybe has no effect on cheating?
There are two extremes; I am not arguing that the one extreme (terminal punishment) reduces cheating, I am saying that the other extreme (light punishment) does not reduce cheating!
You say that stiff punishments have no effect on the cheating rate, right? Compare to what exactly? Compared to no punishments? Compared to light punishments? Compared to medium punishments? Compared to heavy but non-terminal punishments?
Now that I've reread your comment, I'm extremely skeptical that terminal punishments have no effect on the cheating rate compared to light punishments or compared to medium punishments.
It's an extraordinary claim, so I want to see this "lots of evidence"; the evidence should basically show no correlation between cheating and punishments.
That's not true. People still pick up USB sticks from the street, people still fall for scam phone calls and people still click on links in mail.
Just because a method was successful once does not mean it was 'burned', none of these people will be checking each and every future pdf or passing it through a cleaner before they will do the same thing all over again and others are going to be 'virgin' and won't even be warned because this is not going to be widely distributed in spite of us discussing it here.
If anything you can take this as proof that this method is more or less guaranteed to work.
Yup, precisely this. Doing something bad is rarely a rational commitment and cost of benefits. Likelihood and celerity of getting caught seem to be the driving factors.
It makes honest people feel rewarded, valued and acknowledge. It teaches people who wish to follow the rules and conform to social norms what those norms are and where we actually draw the line in practice.
Looked at slightly differently, given a split between high trust and low trust preventing conversions from high to low is similarly important to inducing conversions from low to high.
Well, maybe they found themselves in the last hours of the deadline without the reviews done... in some cases due to procrastination, but in a few cases perhaps because life is hard and they just couldn't do it. So they used the LLM as a last resort to not go beyond deadline (which I assume maybe was penalized as well?)
To err is human, it makes sense that they are punished (and the harshest part of the punishment is not having a paper rejected, it's the loss of face with coauthors and others, BTW. Face is important in academia) but "for life" is way too much IMO.
This year, having their own submissions desk-rejected is strong enough of a signal that the policy has some teeth behind it. Let’s ban em for life next year.
I strongly feel that deterrence should be the goal here, not retribution IMO.
It has been shown time and again that, for most people, teaching them to be better and giving second chances is more effective than using forever-punishment as a warning for others.
This line of reasoning interests me because it seems to arise in other contexts as well.
Do very harsh punishments significantly reduce future occurrences of the offense in question?
I've heard opponents of the death penalty argue that it's generallynot the case. E.g., because often the criminals aren't reasoning in terms that factor in the death penalty.
On the other hand (and perhaps I'm misinformed), I've heard that some countries with death penalties for drug dealers have genuinely fewer problems with drug addiction. Lower, I assume, than the numbers you'd get from simply executing every user.
I'm not sure it was meant that way, but nice metaphor. For some students "academic death" might really be better than a life of being trapped in a system that they can only navigate by cheating.
My understanding is that something among those lines happened:
> All Policy A (no LLMs) reviews that were detected to be LLM generated were removed from the system. If more than half of the reviews submitted by a Policy A reviewer were detected to be LLM generated, then all of their reviews were deleted, and the reviewer themselves was removed from the reviewer pool.
Half is a bit lenient in my view, but I suppose they wanted to avoid even a single false positive.
Thank goodness we have you passing judgment on the internet; otherwise who else would be around for us to do it? I'm glad you're willing to destroy someone for a mistake rather than letting them learn and change. We all know that arbitrary and harsh punishments solve everything.
"Oops, you told me not to do this, and I volunteered to agree to these stricter standards yet I flagrantly disregarded them, please forgive me" doesn't seem like something you just accidentally do, it's a conscious choice.
I've been an AC (the person who manages the reviewing process and translates reviews into accept/reject decisions) at ICML and similar conferences a few times. In my experience, grad students tend to be pretty good reviewers. They have more time, they are less jaded, and they are keener to do a good job. Senior people are more likely to have the deep and broad field knowledge to accurately place a paper's value, but they are also more likely to write a short shallow review and move on. I think the worst reviews I've seen have been from senior people.
It's usually not "noob" students. Big conferences require reviewers to have at least one (usually more) published paper in major venues. For students, this usually means they went through the process of being the first author on a few papers.
Ok but you need peer reviewed publications to graduate with a PhD.
And if you retort that the whole academic system is obsolete, well, it still carries a lot of prestige and legitimacy that makes politicians interested in maintaining it, so it's not going anywhere soon.
In many cases authors and reviewers are not the same. In your first two publications to such venues you are not allowed to review yourself and need someone else.
I think consequences are well deserved, but hopefully not on the authors cost (if innocent).
It’s an unethical, false choice. The reviewers are not perfectly rational agents that do free work, they have real needs and desires. Shame on ICML for exploiting their desperation.
Banned for life is a stretch but the actual response is completely fine. They can just resubmit to the next conference.
Words mean something, if you promise to uphold a contract and break it, there are consequences. The reviewers were free to select the policy which allows LLM use.
Is it? The reviewers could simply have chosen a different option in a form field. While I understand that they were "forced" to review under reciprocal review, they still had other choices where I don't see coercion happening and that could have avoided the outcome for them.
And detection was not done with some snake oil "AI detector" but by invisible prompt injection in the paper pdf, instructing LLMs to put TWO long phrases into the review. They then detected LLM use through checking if both phrases appear in the review.
This did not detect grammar checks and touchups of an independently written review. The phrases would only get included if the reviewer fed the pdf to the LLM in clear violation to their chosen policy.
> After a selection process, in which reviewers got to choose which policy they would like to operate under, they were assigned to either Policy A or Policy B. In the end, based on author demands and reviewer signups, the only reviewers who were assigned to Policy A (no LLMs) were those who explicitly selected “Policy A” or “I am okay with either [Policy] A or B.” To be clear, no reviewer who strongly preferred Policy B was assigned to Policy A.