When considering a new software hobby project, I like to target tools and technologies that I’ve heard good things about, yet not had the opportunity to try. Reading the documentation and seeing glowing reviews are good indicators, and help prioritize my time for maximum probable benefit, but at some point, one has to to just try it.
Of course, one of the first steps in a new software project is to have a revision control system. For those outside the software profession, this is software that keeps a copy of every version of a file. It lets me see what my file (or set of files) looked like at any point in time, and if desired, flip back to it and let me work off of that older version. Typically, revision control systems are used to allow multiple people to collaborate on the same set of files in parallel.
The revision control systems that most people are used to are centralized, like CVS, Subversion, or ClearCase. This is the obvious approach, where a computer acts as a central repository of files, and other computers connect to it to get the latest changes and send its own local changes to be saved. This is the model I’m used to, most recently with Subversion.
The other model I’ve heard a lot about recently is the distributed model, where every participating computer has its own copy, and changes get pushed around in a star pattern (e.g. like the <a href=’http://en.wikipedia.org/wiki/Star_of_david’Star of David). There are several in this space, but after reading the history and tradeoff discussions of the various contendors, I ended up choosing GIT, which is created by Linus Torvalds of Linux fame.
To be clear, the fact that Mr. Torvalds created it contributed to my decision, but not in the sense that I am an automatic fan of his work. He does kernel development, a branch of software that I am familiar enough to know to be difficult, but little else. In my experience, however, the best tools to adopt are the ones that are simple, and yet have a significant fan following to maintain a momentum of improvements. GIT is designed in the UNIX fashion (many simple, orthogonal commands), was very well documented by the time I looked into it, and has brainy star power behind it.
So far, I’ve found GIT to be a refreshingly simple, but effective design. I’ve only experimented with my various resume incarnations I’ve accumulated over the last 8 years, but that’s been enough to give me a feel for some of the neat features.
It’s easy to read about on the GIT wiki, but I’ll try to describe my understanding in a nutshell:
GIT uses a special algorithm (sha1sum) to calculate a unique ID for every file; this unique ID depends only on the content of the file, rather than being associated with the filename. Every version of every file is stored in this manner and added to the repository.
The various IDs are linked together in “branches” that mirror our understanding of relationships between files, i.e. the file with ID 03cfd743661f07975fa2f1220c5194cbaff48451 is the predecessor of the file f19183f4f3c15a87f3831597f40a425f8527b72.
With those two premises, GIT is then able to logically and regularly handle the many cases of file history that can happen. For instance, representing a two files that were derived from the same original is easy (most revision control systems can do this). Representing a single file that is a combination of two original files is just as easy (most revision control systems can’t do this). More exotic conditions are handled in the same consistent manner; the only difficulty is wrapping one’s head around it. Here’s a graphical example using gitk:
The other aspect of GIT, the decentralized nature of the repositories, is not something I can really say much about, since I’m using the repository just for myself right now. The way it is described, however, makes sense, given the premises above. If you’re interested in someone’s changes, you contact them, ask for the all the files related to a particular branch, download, and merge with your own local copy. It strikes me as considerably more involved in terms of needing to know who the participants are and what they do, but at the same time, the individual is no longer at the mercy of someone else sending them an incompatible change to something they’re basing their current work on.
So far, I really like GIT. It has a simple design that logically and consistently handles all cases, which I think should be the goal of all software designs. The lack of support for some of my other favourite tools ( e.g. the Eclipse IDE) is unfortuantely, but it looks like it’s coming along, and at any rate, for my expected needs, it’s not a requirement anyway. Now that I have one shiny new tool in my toolbox, it’s time to evaluate another promising candidate that’s sat on the benches for far too long.