Preventing students from force pushing

Greetings, everyone.

I’ve had a pretty good experience using GitHub Classroom for a large Computer Systems class (123 students) this year. We’ve used webhooks to notify our own server when students push code, so that our grading server can pull the updated code, run tests, and push a grade report back to the student’s repo on GitHub. Students like the real-time feedback a lot.

We did have a few cases where students wanted to “back-date” submissions by force-pushing to GitHub and modifying a past commit’s log message. (For late submissions after the deadline, we have students include a reserved string in the commit message to indicate “grade this submission even though it’s after the deadline.” We do this so that we don’t penalize students for pushing after the deadline, which will lead to lateness penalties in grading, unless a student truly intends to submit after the deadline.)

Protected branches would help: they would let us disallow students from force-pushing. But it appears that one must set each individual student’s repo to have the master branch be protected…and if a student creates a different branch, that branch wouldn’t be protected by default, which might let the student force-push on that different branch, then move the doctored commit history onto master.

I’d like to request that GitHub consider implementing “all branches in all repos protected by default” as an option for Organizations. This would be a big help for GitHub Classroom users who want to prevent students from doctoring commit histories–which is essential if the commit history on GitHub is to be used in grading.

Thanks,
-Brad

1 Like

I might be missing the point here, but I wonder why the standard mechanism introduced in GitHub Classroom last summer for dealing with assignment deadlines is not of help to you in this case. Any attempt to doctor the commit history should be caught up.

That said, I’m afraid that your request of making “all branches in all repos protected by default” is a bit too tied up to your particular use case to be adopted largely in the system.

Since you are already employing a server for handling the automatic grading, one option would be to add a further webhook to trigger a response when a new branch gets created so that the server will make it protected at once. Though, I don’t know if the GitHub API would allow for that; presumably yes, given the richness of the REST interface.

Let me try to explain in more detail.

The standard mechanism you cite does two things: 1) it identifies the hash of the most recent commit before the deadline (as measured by the clock on GitHub’s servers) as the “submission”; 2) it allows further commits after the deadline, whose hashes aren’t separately stored at wall-clock times in the way that is done in (1) at the submission deadline.

We need to consider two cases separately: that where a student submits before the deadline, and that where a student submits after the deadline. If a student submits before the deadline, the “submission” mechanism will at least reveal when a student has modified commit history after the deadline, as the commit hash recorded as the submission won’t match that of the modified history. So (1) is adequate to detect this sort of cheating for submissions before the deadline (but I’d argue that prevention rather than detection is preferable…and protected branches would prevent force pushes).

Now the post-deadline case: in my class, when a student submits after the deadline, I reduce the grade by 10% for each 24-hour period or fraction thereof that elapses between the deadline and when the student submits. To do so, I need a reliable indication of when the student has decided to submit after the deadline. I can’t merely grade the “last” submission after the deadline because it’s impossible to know whether a student intends to make further commits, and worse, if a student submitted before the deadline, and then later pushes a commit to GitHub after the deadline not intending for it to be graded (this is possible; students do all sorts of things unintentionally), there is a risk I could grade the post-deadline commit and apply a lateness penalty when the student may never have intended the post-deadline commit to be a submission at all.

So that there is a clear indication from a student that they intend for a post-deadline commit to be graded, I require students to include a reserved string (“LATESUBMIT”) in their commit log message for a commit after the deadline that they want to be graded. And they’re told only to make one commit with this label–and that if there’s more than one, only the earliest of them will be graded.

If students can force push, they can modify log messages on past commits, and add “LATESUBMIT” to an old commit’s log message in an attempt to create the appearance of a submit date in the past (when the rule in my class is that the time of submission is the time when the student declares that they wish to submit).

I believe students can also force push to change the timestamps on past commits (since these timestamps come from the client that is pushing). That’s another avenue for forging an earlier submission time for submissions made after the deadline.

If it were possible to specify that an organization’s new repos should have all branches protected by default, then force pushes wouldn’t be permitted, and commits to HEAD would be immutable. That would prevent adding “LATESUBMIT” to a commit’s log message after the fact, and would prevent students from changing the commit timestamps of past commits. (It would not, however, prevent students from forging timestamps on new commits to HEAD, so long as the new commits’ timestamps are after the timestamp of HEAD. My grading server could just use the event timestamps on pushes as the time of a submission rather than the commit timestamps, though.)

To be clear, I’m not proposing an intrusive global change to default repo policies on GitHub. All I’m suggesting is offering Organization admins the chance to configure an optional policy of disabling force pushes on all branches of all newly created repos owned by that Organization by default. It may be that other GitHub users, not just Classroom users, find the option of setting such a non-default policy useful–there are many Organizations that find force pushes problematic and want to disable them. I see no reason why this feature should be onerous to implement…and would welcome the reaction of a GitHub developer.

-Brad

Rebasing is at the core of git, therefore the ability of rewriting the commit history is fundamental to keep as clean as possible the interaction among the fix/feat branches with the production branches. On the other hand, it may be clearly desirable to prevent push --force operations on master, for example.

It is not that the new policy you suggest is not easily doable, but rather it goes against one of the central paradigms of git, requesting a complete disabling in a too drastic manner, that is “organization-wise”. My point is thus not practical, but more philosophical, let’s say, remaining only a personal opinion, of course.

Just to be clear, I’m not proposing categorically disabling force pushes for all repos owned by an organization (i.e., disabling them at a whole-organization granularity). The current “polarity” of the force pushes policy upon creation of a new repo is “allow by default on all branches,” where the organization that owns the repo is free after a repo’s creation to manually configure the denial of force pushes on any branch in that repo.

What I’m suggesting is offering the opposite “polarity”: “deny by default on all branches” upon creation of a new repo, where the organization that owns the repo is free after a repo’s creation to manually configure the allowing of force pushes on any branch in that repo.

This change is about increasing flexibility: letting force pushes be either “default allow” or “default deny.” It doesn’t categorically prevent rebasing for an entire organization. In fact, this “default deny” policy could potentially even be specified per Classroom assignment, rather than per organization (e.g., so that non-assignment repos owned by the organization could use a “default allow” policy). If an instructor chooses not to allow rebasing in repos for an assignment in his class, must that configuration decision be seen as irreconcilable with git’s architecture? I’d like to think (in my personal opinion) that offering instructors this flexibility only makes GitHub Classroom more useful (without changing git’s support for rebasing for users who don’t choose to configure assignments this way).

A different option would be for Classroom simply to expand the “record hash at deadline” feature, so that a submission hash is recorded not just at the initial deadline, but exactly every 24 hours after the deadline for a number of days configurable in the assignment. That would support late submissions at a 24-hour granularity of lateness, and detect tampering with past commits through force pushes…

-Brad

1 Like