March 11, 2017

Git Commit History: Your Project One and Only Documentation

How many times have you searched for project documentation only to find that it is either non-existent or out of date? Documenting a project is hard. Someone has to spend hour after hour to keep up the documentation in sync with the codebase. When you are rushing to meet a deadline, the first thing you cut off is the time to write good documentation – to document what design decision you have taken, why you have taken them, to update the architectural diagrams, and so on.

Why is documentation so hard?

Writing proper documentation is hard. Maintaining it up to date is even harder. It's not an easy task to enforce the good practice of documenting code when teams are constantly dashing from one deadline to the next. "Ah, we’ll write those docs right after the release." – an excuse you hear every other day. But right after we release this feature comes the next. And the race with time starts all over again.

Even if we leave the time pressure alone, there are many other factors that cause projects to end up with zero or outdated documentation. It’s normal for people to forget things. You change the code but forget to go and change the Wiki page. Not a big deal – nobody ever reads the docs either way.

We come to the motivation. People are not motivated to maintain documentation as they don’t believe someone will ever gonna read it. It’s easier to just pull up a colleague by the sleeve and ask him. Or go the hard way of digging into the code and trying to build the big picture in your head from all the details.

A design document is outdated even before that design is implemented. Obsolescence – the process of becoming obsolete or outdated – is what happens to your documents right after they are “done”. A design document is never finished. It should change as the codebase changes. I admit that in 20 years, I cannot remember working on a project with good up-to-date internal documentation.

The communication-with-the-future problem

I find the ideas put forth by Kent Beck in his book "Extreme Programming Explained" sound and aligned with my way of thinking about how software should be built. One of the core principles described in that book is called "Mutual Benefit". It says that every activity should benefit all concerned.

Creating extensive internal documentation is a practice that violates this principle – it tries to solve a communication-with-the-future problem by costing one person while helping another. It slows the development time considerably now so that someone may have an easier life maintaining the code base in the future.

XP solves the communication-with-the-future problem in mutually beneficial ways. Write descriptive automated tests that validate the behavior and list down all scenarios. Refactor the code structure to make it understandable and maintainable. Use intent-revealing naming to explain the "what" in the code itself. Explain the "why" in the commit message.

Today I mostly rely on three project artifacts for documenting the whats and whys: the project README file, the commit history, and a descriptive test suite. All of these are kept safe in one place – Git. The further away from your code, your docs are, the harder is for you to maintain them.

A well-crafted Git commit history could answer many questions that you and others may have in the future. You can tell the whole story behind a change in a commit message or a pull request description. Let’s see how to get there – to have good project documentation without explicitly writing project documentation.

Commit messages

Craft your commit messages just like you craft your code. Git commit messages are means of communication within the team. But even if you work alone they are still beneficial – you communicate to your future self. Write them with sheer care.

Your main goal when writing code is to make it readable and understandable. You should pursue the same goal when writing a commit message. Same as code, a commit message is written once but read many times. Every time you want to know the "why" behind a line of code, you git-blame and read the commit, hoping it'll provide the answers you are looking for.

Always write a message body

Always! No excuse accepted. Even for small commits, you can elaborate on why you are doing whatever you are doing. The commit message should explain "why" you are doing the change. It should reveal the intent. Do not try to explain what the code does – that should be clear from the code itself. If it is not, then re-work the code. Don't try to explain complex code in either commit or comments, just re-write it. May the code is not complex, it's just bad.

Saying "Fix a bug" is not helpful at all to anyone. Explain what the bug was. Why it is fixed the way it's fixed. What other approaches you've discussed and why they were rejected as a good solution. What was the impact of the bug? Why does it have to be fixed now?

Don't leave "Fix the build" commits in your project history. Re-write your commit history and squash them. Use fixup commits and autosquash to let Git do that work for you. Re-write your Git commit history as many times as needed until it's perfect. Iterate on your commit history in the same way you iterate and refactor your code. Make sure branch commits are impeccable before merging that branch into master.

Never ever consider writing a longer commit message as a waste. Most probably you will be the one trying to get more context about a change after several months and you will thank your past self a lot for spending the time on writing a good explanatory message. Good commit messages will give you a lot of context around the code. If you are using git commit -m now may be the right time to troll your own shell and alias that into something that slaps you across the face.

Keep the title short (under 50 characters). That helps a lot when you do "git log --oneline" as you could quickly skim through small messages to find what you need. Once you find the commit by its title, you could simply git show that commit and get all the details. When writing your commit messages, think about people that will search for your change. What keywords they will be using to find it?

Be focused on one thing only

The same principles that apply to writing good code apply to writing good commit messages as well – keep them simple, keep them small, keep them focused. You have small classes and small methods that have a single responsibility. You should do the same with your commits. Make them cohesive. Keep unrelated changes in separate commits.

If your commit explains too many things, maybe you are changing too many things. Big changes fail to provide context when doing git blame. There are just too many things going on. You have to read through everything and filter out stuff until you get to the piece that you actually need.

Large commits don't help code reviews either. Sometimes it's easier to review a big change commit by commit. See the path the author has walked to reach the final solution. For example, split the refactoring in a separate commit from the main change, or have the tests in a separate commit so that the QA team may review only that piece.

"Misc improvements" – we all have seen commit messages like this one. Then you open the commit and there are tons of unrelated changes across unrelated abstractions. Fixing typos, changing names, moving files around, refactoring – mix all of these with a change in behavior and you get a completely unreadable unreviewable mess.

It may be easier for you to work for a few hours and then commit everything. But that is a bad habit. It's just a sloppy routine. A good working process is – do a small change, commit, repeat. If you don't want to interrupt your thinking process with constant pit stops for writing descriptive messages, you can commit with one-line WiP messages, and once you are done, go back and re-write your commit history.

Link requirements

The best place to link a scoping document is the pull request description. Anyone who looks for more context than what's provided by the individual commit messages and the PR description could follow that link and see the extended version of the story.

But a link to the requirements doesn't replace a good PR description and is no excuse for writing one-line commit messages. Your company may change the project management tool and all those links become broken. Have the gist in Git. Have the details linked in Git.

Don't copy and paste the scoping document into your PR description. Don't copy all commit messages into the PR description. To re-phrase – don't be lazy. Think about the person who would need information and go through your writings to find it. Remember that this could be your future self.

Commit history

Your commit history should read like a book. Polish your commits in the same way you polish your code. The master branch is your sacred ground. Never merge anything that is less than perfect. Iterate on your feature branch commits as many times as needed to make them read like a good story. Tell your journey of making that feature. Let the others be a part of it. Let them feel what you've felt while crafting that piece.

Being part of a remote company has thought me the importance of communicating clearly in written form. You and your colleagues don't share the same office space. You cannot just pull someone by the sleeve and ask him a question. You cannot just rise from your desk and have a quick discussion. You communicate through Github or Slack. The more context you provide in your writings the less time everyone will spend chatting.

Pull requests and commit messages are a great way to communicate with your team remotely. Keep in mind that the person who wrote that code won't be always available near you to tap him on the shoulder and ask him "why". The person who doesn't understand the code after a year could easily be you. And your future self will thank a lot your past self for writing good explanatory descriptions.

Your project grows feature by feature – one step at a time. Your Git commit history should reflect that process. You can easily achieve that by using feature branches. Feature development takes place in a dedicated branch. Once the feature is ready – your branch is verified, reviewed, and tested – you rebase it on the master branch and merge it using a merge bubble (using --no-ff). Your commit history should look like this:

06e06b46 Merge branch 'feature 3'
|\
| * a21d5447 commit 3
| * bbf67a8d commit 2
| * 457f14b8 commit 1
|/
*   3cc8bd80 Merge branch 'feature 2'
|\
| * 52a31b8a commit 1
|/
*   d3d0477f Merge branch 'feature 1'
|\
| * bf5c9f15 commit 2
| * 8290d105 commit 1
|/

You can see how the project evolved feature after feature. You know all the commits that belong to a certain feature. And if something goes on fire, you can easily drop the whole feature by reverting that merge commmit. Bottom line: you should do most of your work in topic branches.

Learn Git

I cannot stress enough on this one – know the tools you are using. Learn Git. "Pragmatic Guide to Git" book is a good start. Pro Git comes next. Go beyond the basics. Don't think that simply because you use Git every day, you know everything about it.

The less you know Git the harder it is for you to craft a clean commit history. The harder it is for you to craft a clean commit history, the more time it will take you. The more time it takes you, the less you are willing do it.

Interactive rebasing should be your second nature. You should feel comfortable re-writing your commit history many times, amending commits, rewording messages, splitting and squashing commits. Don't think of Git as the place where you store your code so that it's not lost. Think of Git as the place where you tell the story behind your change.

The time "lost" in writing that story is never a waste. You write commit messages once, people read it many times after. It's a medium for knowledge transfer. Everyone on the team should be familiar with the new feature. It should be crystal clear "why" you are making that change.

A simple example

Let's go through a simple example of splitting changes. I often see developers who have committed too many changes in a single commit but don't want to "waste" time splitting that commit into dedicated ones each holding exactly one change. Some don't know how to do it. Others are afraid they will end up messing up their changes. It's actually quite simple.

Somehow we have managed to include two unrelated changes in a single commit following the common practice of working for quite some time and at some random point, we commit our work. Let's further assume that those distinct changes live in the same file, just to spice things up. Splitting changes that reside in the same source file could feel unnatural. After all, that file has to be both staged and unstaged at a point in time.

We stage changes with git add. However, using only git add is too coarse-grained for our use case – it'll add the whole file. We need more control. Git add --patch brings that fine-grained touch we need. Its interface could look scary at first, but once you get accustomed to it, you'll find it quite easy to use, and the fear of losing your work while playing with Git will be gone.

Git add --patch presents you with an interface that asks you how to deal with each change. Your changes are shown in separate chunks and Git prompts you on what to be staged and what to be left behind. The most common commands that you'd want to remember as you'd use regularly are:

y = stage hunk
s = split hunk
n = leave unstaged

These commands are useful when the code lines are far apart. But what if two sequential lines must go into separate commits?

this line goes into commit 1
this line goes into commit 2

Git doesn't know how to split this hunk. You need a secret editor where you can tell Git what goes into staging in a line-by-line mode. Hit e and you will enter editor mode where you can "edit" what is staged and what is unstaged line by line. Space re-adds lines you've deleted. Just deleting added lines unstages them. There is inline help at the bottom so you don't need to remember everything. I don't.

# To remove '-' lines, make them ' ' lines
# To remove '+' lines, delete them

Once you are ready, you can check the staged changes before committing them with git diff --cached. If everything is fine, then commit what's staged. The changes you've left behind can now easily go into your second commit. And that's it – problem solved.

Takeaway

Craft your commits as you craft your code. Keep them small, focused, and revealing intent. Your project is more than just code. The code only tells what the software is doing. It's not telling why it's doing it.