Github Classroom: plagiarism & cheating detection


(Dan Wallach) #1

We’re currently using our campus Subversion server (yeah, uggh) and are looking to switch to Github Classroom. So far as I can tell, I’ll be able to do everything I need with the various scripts that everybody’s put together, but I have a few questions concerning dealing with the inevitable students who think they can get away with cheating.

Git, of course, makes it really easy to rewrite history (git rebase -i, etc.), such that what they have in the repo and what they actually did might not have very much to do with one another.

Of course, I can run MOSS or other plagiarism tools on the output, but when I submit cases to our university’s honor council, I’ve found it helpful to dig into the version history of each accused student and say things about who committed first and had a variety of commits, versus who had just a big dump of code at the deadline. This leads to my questions:

  • Does anybody play games like this, perhaps disabling “fast forwards” in their local copies, such that regular git pull commands will scream if a student tries to edit their history?

  • Similarly, I could imagine cobbling together a script that flags any commits whose timestamps are beyond some interval. Basically, if you try to invent something from yesterday and push it today, I’d want that to be detected, or at least detectable.

  • Do any of you use the Github webhooks framework (https://developer.github.com/webhooks/) to trigger events when students push their code? I could imagine doing a local pull each and every time a student does a push. Maybe I just do a clone every time to a new local repo.

  • How well do students with zero prior Git experience manage with these tools? I already introduce IntelliJ and have them use the VCS features within it, so in theory the difference between Git vs. Subversion isn’t all that much, and the Github interface gives them a nice web page to confirm that their commits have been pushed.

Thanks!


(Vanessa) #2

Hi @danwallach let me think about your suggestions and will circle back with recommendations.

In the meantime, you may find David Malan’s honesty-priming tool interesting from CS50 (slides 36-37):


(Dan Wallach) #3

Replying to myself: it appears that the Github “events API” (https://developer.github.com/v3/activity/events/) provides a series of timestamps along with the Git commit ids, so I can get timestamps for when the commits actually hit the Github server. That’s something I could use to detect when each student pushes their work to the server, which, I guess, is enough to detect both late work and funny business with back-dated commits.


(Vanessa) #4

@danwallach if memory serves correctly, @Omarasifshaikh does this for his courses at SFSU. I’ve also heard of teachers shutting off the server after a certain time so late assignments aren’t accepted.

also @rebelsky onboards his students to Git in an early lab:


(Jakub Narębski) #5

The answer depends on the workflow you use. If you periodically pull/fetch from students, you can configure Git to use reflogs also for remote-tracking branches, and to preserve them for a longer time. Even if you didn’t disable non fast-forward fetches, you will have (meta)history.

For the workflow where students push to either central repository, or per-student public repository (e.g. on GitHub), you would need to rely on hooks notifications or, as you wrote, server logs - for GitHub its events API.

Note: Git has signed pushes: those may help, or may be total overkill for this situation.


(Dan Wallach) #6

I’ve possibly come up with yet another solution to the problem: Travis-CI. I followed @Omarasifshaikh’s instructions for setting up Travis-CI and then added a rule to my build.gradle file to print the date. That’s it. Now, when a grader is looking to Travis-CI for verification that our various unit tests and such have passed, the date will be in the log.

So far as I can tell, the only downside to this is that Travis-CI has (so far) only offered me one simultaneous instance, which could back things up for hours around deadline submission time. I’ve asked the Travis people to give me more instances, and then I just need to cut my students a few minutes of slack.


(Vanessa) #7

@danwallach let us know how it goes with the folks at Travis (and how your solution shakes out this semester!)


(Giorgos Sfikas) #8

Concerning the ‘plagiarism’ part of the topic title. Given a specific github classroom assignment, is there any automated way to check whether there are submitted solutions that are too much alike / whether a student has copied some other student’s solution?

If not, does anyone know if there are any thoughts about including something like that as a github classroom feature?


(Dan Wallach) #9

Best solution right now is to clone every student repo to your own machine (see my github-clone-all project or many others just like it), then submit everything to MOSS.


(Matthias) #10

It would be great if there was an easy way of searching all my students’ private repositories through the web interface. Adding this functionality to GitHub Classroom would make things easier for many (hint hint, @mozzadrella)
Maybe there is a way, but I couldn’t find one.

When giving feedback to my students in the form of raising issues where things need improving I saw a few lines of code that seemed familiar.

As this is not the students’ final submission I am checking the work through the web interface. It would be great to be able to search all the code of students from my class so that I can find plagiarism / collusion easily and tell them off early before it affects their grade.


(Dan Wallach) #11

You should investigate MOSS. Among other cool benefits, you can then use nifty visualization tools like mossum. You’d first clone everything to your personal computer, perhaps using a script like my github-clone-all.

Doing plagiarism detection by hand is a tricky thing. Some students are better than others at hiding their work, but MOSS can see right through it. It’s not perfect; you still have to analyze its top pairings, and some of them will be false positives. The real giveaways are when two students share a weird design, like an extra helper function that they didn’t need to write at all.


(Matthias) #12

Thank you.
I looked into MOSS when I read about it here, but I’m using PHP which doesn’t seem to be supported.
I’ll have a look whether I can find something similar for PHP but a Github Classroom search function that works across the class would be great in any case.


(Dan Wallach) #13

Yeah, MOSS has a long but not infinite list of languages that it supports. You might try running an automated translator (e.g., php2py) and then running MOSS on the results. This amounts to compiling then decompiling the code, so it will obliterate comments, indentation, and other such things, which are important hints. Still, if a process like this finds something, then you investigate it and maybe you’ve found something actionable.

As to what Github Classroom might do, I don’t speak for them and I have no idea what their development priorities are. It seems pretty clear that what they offer is a thin slice over the top of the regular Github service. If there’s something that’s student-facing and Classroom-specific, like setting up and cloning repositories, that seems to be on the top of their list. Instructor-facing features seem to be lower priority.