Let’s say you’re working on a big project – your final paper for a class at school, or a report for your boss. The project is long and complicated, you chip away at it over a period of months, and go through many many revisions of editing.
Then you save your last copy, Big Project Final Version and hand it to your trusted friend to look over. They find 100 more little typos, so you fix those, then Save As: Big Project Final Version 2.
Just to be safe you read over it one more time, and decide you actually want to go back to what you had before in a few places. Hmmm… how exactly was that worded again?
Save As: Big Project Final Version Final.
Save As: Big Project For Real Last One.
Save As: Big Project FINAL.
Then your trusted friends trips over your dog and spills coffee on your computer and you lose everything.
Does any part of this sound familiar?
What is Source Control?
Source control is a solution to this problem. Source control is basically a “saving” system. It is a program you can use to organize your saved copies, easily look back at past versions, keep multiple versions in parallel, and back the whole project up on multiple computers.
Some consumer software also includes systems that do some parts of this job – Microsoft Office and Google Docs have the ability to look at your Version History, for example.
In the programming world, the most commonly used source control system is Git. Other major ones include Subversion and Mercurial. I’ll focus my description on Git, since this is the only system I’ve ever been exposed to.
Git Commits
When you want to save a project in Git, you create a git repository (“repo”). A repo is just a folder with a couple of hidden files in it to help keep track of things.
You can add anything you want to this folder – code, images, copies of other entire programs – whatever! And git can keep track of it.
All you have to do is periodically “commit” your work. This puts a “bookmark” or a “save point” into the history, and allows you to come back to this version later if you want to.
Every time you get your work to a point you want a snapshot of you “make a commit”. This is probably a few times a day, depending on what you’re working on.
Git Branches
The most basic use of Git is as described above. You just keep a linear history of your work, and you can go backwards and forwards in time through these revisions. This is a Git history with only one branch. Customarily, this branch is given the name master
, though that’s just a convention – there’s nothing special about that name.
But many projects are more complicated than this, and require keeping multiple, slightly different, versions at the same time. Some of these versions may even be mutually exclusive from each other. Take a plumber’s resumé, for example – they might keep two copies up-to-date – one for getting a job actually doing plumbing, and another version for getting a job teaching plumbing.
These projects have multiple branches. Perhaps one named plumbing
, and another branch named teaching
.
In Git, we choose a point in the history we want to start from (which is often just the most recent commit), and branch off from there.
Another common use of branches is to use them to work on some new feature or aspect of your project while leaving the master branch unchanged. This keeps master “clean”, and also allows you to have multiple tasks in-progress, without having to finish one before moving on to the next.
Maybe you’re storing the code for a note-taking app in a Git repo. Your next task is to add a “Share” button in your app. So you create a branch called addShareButton
. Now when you make a commit, the master
version doesn’t change. Instead, only this addShareButton
branch has these new commits.
Once you’ve finished working on your new task in this task branch, you can merge it into the master branch, and then your master version has these changes too! You don’t need that task branch anymore, and you can delete it. Again you now only have one master version to keep track of.
Local vs Remote Repos
So far we’ve basically assumed that you’re working alone on this project, but in a work environment that’s rarely the case. You’ll likely have colleagues that work in the same codebase, and want to make changes to the same files as you – often at the same time as you. So how do you do this without tripping all over each other?
Well first lets back up a second to talk about how git is a distributed version control system. This means that everyone can have a copy of the full project history. If you and one colleague are working on a project together, you can both have a copy of the Git repo, and Git will allow you to keep those copies in sync with each other. But neither of those copies is technically “the primary copy” – they are equal. The project has just been distributed between multiple machines.
Now with that said, it is still useful to pretend that one copy of the repo is the main copy – especially when working with teams – and most teams do this.
In addition to everyone having a copy of the repo on their development computers, there is also a copy on a server that is accessible to everyone. This centralizes and organizes your work – everyone can agree to treat that as the “master copy”. When an individual developer finishes a task, they commit their work, and push their branch it to this remote server (usually just called “the remote” or “origin
”. Then this branch can be merged into the master
branch.
There are a couple key benefits of having your work on the shared remote, and having everyone push to the same centralized repo:
- It is easy for everyone to keep up-to-date with everyone else’s work by simply pulling the latest changes from that central repo into their copy on their machine. Without this you would have to separately push your work to each of their machines individually!
- Work is less likely to be lost if your machine dies unexpectedly
- It is clear when work is done – it is done when it has been merged into the master branch on the remote.
There is also one other key benefit, which is immensely powerful and makes life much easier:
- The centralized repo can be managed through GitHub or Bitbucket or other similar software.
Pull requests, code review, feedback sessions
Github has many capabilities. It provides a nice interface for browsing all of the commits and branches in the repo, lets you look at a history of how those changes relate to each other, and is also a directory of millions of projects.
But perhaps the most useful feature it provides is the ability to create Pull Requests.
A pull request is a way of requesting that your branch get merged into the master branch (or any branch). It shows a summary of the changes (which you call the difference (or “diff”) between the branches, and allows your peers to review your work and leave feedback on it before it gets merged.
A reviewer can leave comments/questions, mark their approval, decline your pull request (which temporarily rejects it while you address their feedback), or merge your pull request.
This tool will dramatically improve the average code quality in a repository – partly because people’s feedback will yield useful suggestions, and partly because you try harder when you know someone will be looking over your code with a fine-toothed comb. If you cut corners or do things you know are the lazy way or just plain wrong, someone will probably find those mistakes.
So it holds you accountable for your work, but also is a safeguard around you. Your work is no longer your sole responsibility – you can rely on your team to help spot the mistakes you missed.
Of course this also means that there is more work for you to do – you need to review other people’s code too. And let me tell you – this is often harder than writing the code yourself. You need to try to understand what that code is doing from what little context you can see in the pull request. This is often impossible, or just a waste of time, and a better approach may be to have a real-life conversation with the code’s author, or sit down together to go over the changes.
This can also help ease tensions if you think that the code in the pull request has some fundamental flaw, or you’ve spotted so many errors that writing them all down will overwhelm, discourage, or publicly embarrass the author. These are all bad things to do to a person – and you need to remember to be kind when reviewing someone’s work. They wouldn’t have submitted it for review if they thought it was awful.
Let’s Git This
Git is a powerful tool for managing work shared across a team. It lets you look back in time, see what each other are doing, and make a clear path for how to call something done.
Add Git to your repertoire – it will make you an attractive job candidate, and a better developer. With no need for saving your file versions in separate folders, it will make you more organized.
And some day it might just save you from a world of hurt when you spill coffee into your computer.
Leave a Reply