Classroom vs. custom solution for Jupyter notebook assignments


(David Shean) #1

Hello,
This is my first time digging into Github Classroom resources. Apologies if this is a duplicate topic - my limited search didn’t turn up quick answers.

I’m building a new course for ~15 PhD students, loosely based on the Geohackweek model: https://geohackweek.github.io/. One of the learning objectives is for the students to “Demonstrate best practices for collaborative scientific computing and software development”. To me, this means working in groups to collaboratively solve problems, and using git/github functionality for forking, branching, and pull requests. Essentially, gaining real experience with modern workflows.

I looked at the Classroom model. My initial sense is that it has some great functionality for less experienced students, who might not be ready for intermediate/advanced git. If I understand correctly, all of the necessary forking and pull requests the assignment distribution/submission happens behind the scenes, and the students can focus on learning basic git workflow and the programming concepts. This is great, but I think it may be too limited for my application. I also want to be able to update assignments after distribution to fix issues, if necessary. And I’d rather have my entire course live in a single repo, not have to split content into 10 separate assignment repos each year. Maybe there are workarounds for these issues, or I’m missing something.

Since the quarter has begun, I’ve attempted to design a custom classroom setup.

I have an org set up and two teams: 1) admin and 2) students. I have a private repo where I will prepare modules each week in a new subdirectory containing markdown, Jupyter notebooks, data, and maybe some shell/python code. The “assignment” will likely be a notebook with some tutorial/sample cells, then problems and empty cells for the students to complete. I want them to be able to turn in their completed notebooks, so I can review manually and provide feedback (not necessarily a “right” answer each cell, and I won’t always have rubric, so nbgrader is overkill at this point).

I added the student team to the private repo containing the course material. The students all have access, they can all create their own fork, and can pull any new changes I commit. They can then create a new branch (or add a new notebook), complete the assignment, then “submit” the modified code as a PR in their own fork (or the upstream repo).

Unfortunately, the PR on each student’s private fork is visible to every other student on the team! While these are mature students, and there is some value in seeing others work, I know they will be tempted to copy/paste, or at least consult completed assignments instead of thinking about the problem themselves. I’ve gone through all of the settings for the organization, team, repo, and haven’t figured out a way around this. Any ideas? I suppose I could make a team for each student, but that’s cumbersome and seems to defeat the purpose.

The other issue is diffs on notebooks. I would like to comment on rendered notebook cells, not the raw json, which is not possible with current github functionality. I am hoping to use reviewNB for this, as functionality for this task is forthcoming. As far as I can tell, the Classroom assignment review does not (yet) support rendered notebooks?

Thanks for any thoughts you can provide.
-David


(Dan Wallach) #2

For what it’s worth, I split my assignments into one repo per week, since my assignments are one per week. Then, each week, the students get a fresh repo with my “reference” code for the prior week’s problem, upon which they then build the current week’s assignment. This sort of workflow, which I was originally doing with Subversion (horrors…) fit nicely into GitHub Classroom.

It’s relatively straightforward to configure GitHub Classroom to make sure every student has a private repo of their own. If I read what you’re saying correctly, you’re trying to arrange for all the students to live on a shared repo, but on different branches. That’s fundamentally going to allow everybody to see everything. The only way you’ll get separation across students is to keep them in their own repos.

How do you push changes to students? That’s pretty straightforward. You clone everybody’s repos (multiple people have written scripts to do this, including my own script), you commit a change to each one, and then you push them all. You need to then tell all the students to pull your changes. The way I’ve done this has mostly worked. I’ll have a Thursday night deadline (“write your own unit tests”) then on Friday morning I push my nasty unit tests, and their implementation is due on Sunday night. Reading through the Git logs, I see a bunch of merge operations where they didn’t pull first, but since my changes are adding files, rather than changing them, there are never merge conflicts.

I’ve occasionally used git commits as a way of fixing my own bugs, and that generally works so long as the changes you’re pushing don’t impact the exact files that the students are working on. Otherwise, in a large class, merge conflicts are inevitable. You could do PRs. Or, just say in email or whatever “dear students, you’re welcome to change the function foo to have the following body …”.

Jupyter notebooks and such: GitHub is smart enough to render a Jupyter Notebook, which is pretty cool, but it’s not smart enough to be able to edit or annotate one. For that, you’d need to pull the files, edit them, commit changes, etc. That, again, is just itching for merge conflicts, because no matter how much you tell your students otherwise, there will always be somebody who doesn’t do a pull before they start working on whatever. If you really, really, want multiuser editable Jupyter notebooks, you might prefer something like Google’s Colaboratory, which is like Google Docs with real-time editing, but for iPython notebooks.


(Chris Cannon) #3

Hi @dshean! This is an interesting use case, let me see if I can address most/all of your pain points.

Visible Pull Requests

All pull requests for forks will be visible to anyone who has access to the base repository, this is a big part of why GitHub Classroom performs fork-like actions without actually using the fork feature. Basically, the only way to have these pull requests private is through GitHub Classroom, so let me address some of your concerns with GitHub Classroom and see if it might be beneficial to switch over to a Classroom for Week 2?

Updating Assignments After Posting

I ran into a very similar problem as you with GitHub Classroom, I had updates I had to push but didn’t have a way to automatically update all student projects. I developed some GitHub Classroom Utilities, just basic python scripts, to help me manage this. I think CloneAssignment and AddFile are pretty simple tools to quickly grab all student assignments and update them.

Rendered Diffs

GitHub Classroom creates totally normal GitHub repositories that can be viewed from the Classroom dashboard. These repositories are fully functional in every way! Therefore, if you install ReviewNB for all repositories in your class organization, you can leave feedback while viewing the rich diffs just like you could with the forking method.

I think I addressed all of your concerns here, feel free to comment back if I missed something or if you’d like more info!


(Eric Ford) #4

I’m doing something similar. You can read about my current workflow at https://psuastro528.github.io/tools_used/creating_labs/ . I’m eager to hear if others have suggestions for improving on it. So I recently shared and asked for feedback at Workflow for Combining the Benefits of Jupyter Notebooks with GitHub version control and feedback . Hopefully, someone will chime in with further suggestions.


(Joel Ross) #5

For uploading “starter code” after students accept assignments, I teach them about remotes and how to add a remote reference to my starter repo; then they can just pull updates. This is great for slightly more advanced users.

git remote add upstream https://github.com/my_class_org/assignment_starter
git pull upstream master

Works quite well once they have basics (and merging) down :slight_smile: