New collection of tools to manage GitHub classrooms

When I spoke at SIGCSE '19 earlier this year (slides, YouTube video), I mentioned several different tools that I’ve built over the years to help manage student repos in GitHub.

I recently did a bunch of reengineering of my tools, notably adding a caching layer so you’re spending less time waiting for GitHub to respond with a list of all the repos. I merged together a bunch of separate repos that I had for each tool, and the result is here:

Please play with it and let me know what you think.

Tools

  • github_clone_all: make a local clone of all repositories matching a given prefix
  • github_rate_limit: print your GitHub API rate limits
  • github_private_all: make every repo matching a given prefix private (i.e., fix it if you accidentally made them public in GitHub Classroom)
  • github_graders: assign student repos randomly to graders
  • github_event_times: print the push timestamps for each commit in a student repository

Performance details, because it’s fun: I was commonly hitting my head against the GitHub API rate limits, which say you can only consume 5000-ish units of work in an hour or thereabouts, where a given query seems to burn more than one of these unitless work units. Also, you can sit there for a minute or more while the tool is enumerating all of the student repositories, because GitHub’s APIs will only give you 100 answers at a time, requiring you to “page” through those answers.

There are several “search” APIs that promise to do this significantly faster, but I stumbled into an ugly bug, where they seem to return a subset of the correct answers. GitHub hasn’t fixed this yet, but a helpful engineer there told me about using HEAD requests and checking the ETag. The idea is that you can cache the results and then use this ETag thing as a way of indicating whether your cache is still valid. I hacked that together yesterday and it seems to be working.

For my class of 50 students that just wrapped up, the cache file is 2.9MB of JSON data. For my class last fall, with 180 students, with a separate repo per student per week, the cache file is a more impressive 20MB of JSON data and takes three minutes to create. Thereafter, it’s just one of those HEAD requests to validate it and then everything runs fast.

5 Likes

Wow! Amazing stuff. Thank you for sharing. It will take me some time to wrap my head around all the bits and might come back to you with some questions.

Thank you for a great resource.

Y

Thank you for sharing!