Git Workflows: Should Rebasing Become a Habit for Developers?

Written by Alexander Junger, Software Engineer Backend

As a developer nowadays, chances are that you grew up with Git. Sure, you might have had to use SVN at an internship at some point, but Git is generally what we learn and use these days. You might have found out that Git was created by Linus Torvalds for himself and his kernel developer friends. Maybe you tried a rebase once (by accident?), destroyed a few days worth of somebody else’s work, and now you’re feeling a bit gun-shy. You may very well decide that rebasing is not for you and Git is a glorified save button anyway.

Learn to walk before you run

To me, the most important aspect of proper version control is that it allows you to understand the history of the software you’re working on. Especially in legacy code bases I sometimes ask myself questions like “what were they thinking?”, “is that a bug, or was it once a feature?” Git can answer those questions. That is, if the code history has a linear plot, is structured into atomic commits that have descriptive messages and grouped in branches indicating the larger feature those changes were part of. Working with branches is a topic of its own, owing to many branching models and countless variations. The two essentials however are descriptive messages and atomic commits. That’s something everybody working with Git should be well- versed in.

The intro paragraph obviously uses exaggeration to make a point, but I do believe that many of us are missing out on the more advanced features of Git. Are they needed or can we cover our bases without them?

Descriptive commit messages in reality: “Fixes the stuff”…

Everybody has “been there, done that” – commiting just to get it committed, with the message being an afterthought at most. The established way to stay disciplined while working on unappealing tasks is to make it as easy for yourself as possible. Thus, align on a pattern for commit messages within your team and really stick to it with every single commit. In the backend system of Runtastic, we mostly use the imperative style as per Chris Beams.

What I like to do is add our Jira ticket numbers as a suffix, to add more context. Many Git interfaces automatically integrate your issue tracker when the ticket number is referenced in the message.

Tekin Süleyman makes a point in his talk “A branch in time” that your commits will probably be around for much longer than your company uses your current issue tracker. Thus, relying solely on details in a referenced JIRA or github issue is not a safe bet – it should be an addition to an already explanatory commit message.

Atomic commits in reality: The “Plutonium-commit”

It has a half-life of 2 weeks (indicating the time after which even the author no longer has a clear picture of what changed) and contains a new feature, two bug-fixes and, while we’re at it, the refactoring of an unrelated module. Also two major library updates. And, you guessed it, a new bug…

This is the exact opposite of an atomic commit and can be prevented to a certain extent by simply structuring your work. If you’re testing properly – yes, to me that means TDD – you already have a workflow that makes this very easy. Let’s have a look at how it works in practice: I mostly commit units. That means, if the unit passes its specification, I decide: do I need to specify it further (edge cases etc.)? If not, I commit it, if yes, I might still commit it and amend that commit later, or just continue working on that unit. Your mileage may vary, but this usually gives me commits with a granularity that fits just fine.

Through our Runtastic training program, I was able to attend the Craft Conference in Budapest this year (an amazing conference!). Tim Ottinger gave a nice talk titled “Test Driven: The Four Step Dance”, in which he argued that “Integrate” should be the fourth step after “Red-Green-Refactor”. What does that mean specifically? To him it means “making changes part of the code base,” as in committing, pushing, and having them run through CI.

More collaboration makes it harder to maintain a concise history

So you’ve figured out clean, atomic commits. But then your colleague requests some changes on the pull request. The usual choice is likely a new commit, everything looks good in the overall PR diff, but now we have two non-atomic commits:

In the long term, commit a7176f1 is probably not a relevant part of this software’s history that could cause confusion, or at least some wasted time. We should merge it with the first commit of the branch by performing an interactive rebase, applying the commit as a fixup to the first one. This means that it will be merged into its predecessor, forming a new commit that replaces both of them, while keeping the message of the predecessor (in this case 04b7fc5)

If you haven’t configured a default editor for rebasing, you have to prepend your choice in an environment variable. Like most other backend developers at Runtastic, I use vim for coding, so that’s what I also choose for any interactive rebase.

EDITOR=vim git rebase --interactive 04b7fc5^

The interactive window will display a list of commits to be rebased, allowing you to edit the action to use for each one. The default “pick,” which simply replays the commit on top of its (new) parent, can be replaced with actions including “drop”, “edit” or in this case “fixup”. Once we’re happy with the to-do list, a save and exit will start the rebasing process.

pick  04b7fc5
fixup   a7176f1
pick  2ba3b03

Neat, we just rewrote history to make our commits atomic!

In the example above, we use the parent of the first of our three commits. This means whatever we do in the rebase, those commits will remain ancestors of 7ddc117.
In many cases however, you would run `git rebase -i master` or use any other branch reference. Given that your copy of the referenced branch is up to date, this moves the branching point from where it originally was (say, commit 7ddc117) to the tip of the target branch. The effect is that your branch is now “aware” of what happened in master in the meantime and contains those changes.

This is where TDD step 4, Integrate, comes into play again. I like to continuously integrate the changes of others into my own work, by applying my own branch onto the master or feature branch early and often. This increases collaboration and speed while preventing what I call “merges from the abyss”. These are branches that split off from their parent two months ago and you need to “load more” a couple times in your commit graph to finally reach the branching point. The problem with those: you have no idea, whether the author is aware of the changes that happened to the codebase since then.

By rebasing – integrating often, we make our commit history more linear and easier to grasp. The rebase puts our changes “ahead” of everything else on the parent branch and it’s completely our own responsibility. Thus, a reviewer can safely assume that we considered all those interim changes and that our own changes make sense in that up-to-date context.

But rewriting history is bad!

“Not so fast!”, you say. “Pushing a rewritten history requires force and it can cause mayhem!”

Most devs know this and many teams have the rule to never force push. The argument is that a cleaner history is not worth the risk of losing work by a happy little accident. So incorporating rebasing into your git workflow would require you to also incorporate force pushing. Sounds dangerous, doesn’t it?

The risk depends largely on the type of branch, I’d say. Rebasing a branch that multiple devs actively work on – which in itself is already questionable – certainly carries some element of risk that can only be somewhat controlled by close coordination. However, let’s say the example given above is about a branch owned and worked on by one person. If this developer changes the history on that one branch, no other developer is impacted, except when reviewing it.

So I think we shouldn’t be so rigid about force pushing and rather establish guidelines when it makes sense and when it doesn’t. Here are some best practices I found regarding rebasing and force pushing in the context of the code review process:

Before the Code Review

Sometimes it makes more sense to change the order of commits, or even move some parts of a commit’s diff to another commit. Rebasing is nothing more than sequentially going through the to-do list you modify in the interactive window. This means you can simply halt at one commit  via the “edit” command, reset it and then create two separate commits from the working changes – see the reference for details.

It can also make sense to do multiple runs. Consider the three commits pictured above. Let’s assume that there are some changes which should be part of C1 but, for whatever reason, were only committed with C3. We can use the splitting technique as described above, leaving us with a temporary commit (512ceb0) and a cleaned-up atomic Commit 3.

We run another interactive rebase, moving the temporary commit from line 3 to 2, and, once again, we mark that commit with fixup. We end up with three clean commits and our branch is now ready for review.

For me, rebasing a task branch is now the default before I open a pull request. When I think of an additional test case that’s missing for a class, it goes into the commit that added this class. I decide on a situation-by-situation basis whether I want to shift focus from writing code to rebasing. In that case, the additional test case would be amended to the original commit right away. More often though, I just create a fixup commit referencing the original commit (with `git commit –fixup <SHA>`) because it’s less of a distraction from coding. Just before opening the pull request, I run rebase with the autosquash flag to automatically squash test cases into the commits they belong to. No matter which way you choose, you will help your reviewers with more structured commits.

During Code Review

If you already opened a pull request and your code is being reviewed by others, avoid force pushing rebased commits! Imagine that the pull request suddenly shows a completely new diff, but no other commits were added. If your colleagues are thorough, they’d have to review your entire PR all over again. What this looks like depends slightly on the software you use to review, some are smarter than others in this regard.

 

Some review systems don’t even require force pushing at all, even though the commits of a PR are completely malleable during review. They are only “baked into” the codebase on approval. One such system is Phabricator, which makes a great point explaining their review concept. At Runtastic, we use BitBucket. It works similarly to the GitHub workflow, which means that rebasing and force pushing during review doesn’t work really well. You can, however, commit the requested changes with a fixup commit. After a regular push, the reviewer will see the exact changes you made to the PR and the “history of the PR” is preserved.


Sidenote regarding code review that I find interesting: Linux kernel developers still use mailing lists for that – and Git actually has built-in tools to send diffs via email for exactly this purpose. Check the email section in the Git reference. Because of this review style, they actually rebase frequently in their review process. Whenever changes are requested, the patch author rebases the entire branch and sends out an email with the updated patch, until everybody agrees that it can be merged as is. This goes to show that every team needs to find the process that works best for their context.

After Code Review

Everybody approved your pull request and it’s ready to be merged. Now is the time to clean up those fixup commits. A final interactive rebase with the `–autosquash` flag tidies up those commits into a clean history. What I described above as “history of the PR” is now no longer necessary and all changes should be in atomic commits that build on each other. Small rebases on personal branches are usually nothing to worry about. Now, the first time rebasing a branch with thirty commits or so…that’s where you want to make sure you’ve had your cup(s) of coffee.

If you’re skeptical that this is going to go well, it can make sense to backup the branch before rebasing it:

$ my-feature-branch:~$ git checkout -b my-feature-branch_backup
$ my-feature-branch_backup:~$ git checkout my-feature-branch
$ my-feature-branch:~$ git rebase …

If in the middle of the rebase, you decide that you’re too far down the rabbit hole, abort the rebase with `git rebase –abort`. And should you go through with the rebase, only to discover that you overlooked something, leaving the whole branch in a broken state – time to use your backup:

$ my-feature-branch:~$ git checkout my-feature-branch_backup
$ my-feature-branch_backup:~$ git branch -D my-feature-branch
$ my-feature-branch_backup:~$ git checkout -b my-feature-branch
$ my-feature-branch:~$ git checkout -D my-feature-branch_backup

This works because a branch is nothing more than a reference to a certain commit (just as HEAD), so the commits on your backup branch are not affected by the new commits you’re implicitly creating with the rebase.

A curious mindset is important

You will run into conflicts and it can initially be tempting to simply abort, thinking “I’ll clean up my branch the next time”. But of course conflicts don’t happen randomly, they are predictable and follow some common patterns based on how Git works and the rebase you’re doing.

Git also offers many additional commands that are helpful when things don’t go smoothly:

  • bisect: Allows you to find the commit that introduced a certain behavior (e.g. regression).
    Let’s say after rebasing, you notice that your test suite no longer passes – you must have accidentally broken something, perhaps while resolving a conflict. With bisect, the last known healthy and the first known broken commits can be flagged. Git will then traverse this commit range in a binary search pattern. For every commit it stops at, you check if it already contains the regression and flag it as good or bad. Once the offending commit is identified, it can be fixed with an edit rebase. (It goes without saying that fixing a bug this way should only be done if you notice right away and that code is not merged upstream yet!)
  • reflog: Allows you to see the history of a reference (such as a branch or HEAD).
    That might seem like inception-level stuff at first, but a reference changes over time and, of course, git keeps track of that. It’s not strictly necessary to use this command but it can come in handy.
  • stash: Perhaps the best known in this list, but I still want to mention it because of its versatility. Stash works like a stack for dirty working directories. The default “push” mode saves and then wipes it clean, while the “pop” mode recovers the last state you stored. This is especially helpful for in-between branch switching or during rebasing.

Don’t shy away from the Git reference. It often contains typical examples of how commands can be used. Keep a curious mindset about features that you don’t (and do!) use regularly, know that you can recover almost anything, practice, and you will soon start appreciating those more advanced features of Git.

Why bother?

We have seen a few ways every developer can get the most out of Git in day-to-day work. Now, integrating rebasing into a team’s workflow is definitely a hotly debated topic.

Is it for everybody? Probably not. I’ve shown some examples of completely different strategies and I think all of those have the potential to be the “best one,” providing they work for the people using them.

For me personally, combining the essentials – that is structuring and describing – with cautious rebasing of individual commits on non-collaborative branches already goes a long way. It will increase the quality of your code history, the way you structure your programming, all while keeping risks quite low. As I see it, this can be part of every developer’s reality.

It’s obviously an entirely different animal (in risk, needed concentration…) to move around large feature branches from here to there and I agree that such things should not be done carelessly.

Can it be justified? For sure.

Whatever you’re doing, just don’t degrade Git to a glorified save button.

***

RATE THIS ARTICLE NOW

adidas Runtastic Tech Team We are made up of all the tech departments at Runtastic like iOS, Android, Backend, Infrastructure, DataEngineering, etc. We’re eager to tell you how we work and what we have learned along the way. View all posts by adidas Runtastic Tech Team »