Technorama: August 2011

At long last I've decided to investigate Git as a version control system. I've converted one of my projects to Git using github.com, and managed to get it to build and release using Jenkins. Much help was found here: http://www.nomachetejuggling.com/2011/07/31/ubuntu-tomcat-jenkins-git-ssh-togethe/ I found myself in the middle of a big debate however, about the nature of distributed version control systems and their advantages and disadvantages. Much of the debate can be found here: http://jira.codehaus.org/browse/SCM-444 It would seem I've stepped into a mine-field of issues centered around Git's distributed flexibility, and Maven's rigid stability.

The issues call into question a lot of closely held beliefs I have about reproducibility and stability, and the purpose of continuous integration. The crux of the issue is that Git enforces no central 'authoritative' server, but allows an entire repository to be cloned locally, and then only reconciled when desired.

One of the first advantages I can see in a distributed system like Git is an issue that I'm currently facing at work: I have a lot of changes that need committing, but I still have some work to do before it's ready to share. Git solves this by separating the committing and sharing process, so that I can commit my work more frequently, and have finer granularity if I choose to back some of that work out. In my current situation at work, as I progress through the changes necessary to finally be able to share my work, If I should need to back out just the last hours work, I have no recourse. I could make patches as checkpoints (and I think I'll do that first thing tomorrow), but that would be actions that I am not accustomed to, and could be error prone. Git would make this a trivial problem, by allowing me to commit to a local repository, and finally share when it's totally ready.

One of the disadvantages of a distributed system like Git is that it enforces no policy on sharing, and so leaves that up to the project owner/committers. I suppose this is somewhat akin to the sticking point many have about SVN, that there is no enforced notion of tag/branch, so it becomes up to committers and convention. And there are many tools that assume one convention or another concerning SVN branches and tags. So is true about Git and sharing, there are tools that make assumptions about the conventions adopted by project owners. Specifically, when Jenkins wants to release a version, it must share some changes to the project, so that the version numbers are maintained consistently. This assumes there is some centralized authoritative repository to share to.

So to be able to maintain versions consistently, one needs to adopt a sharing policy, specifically a centralized authoritative repository where the versions can be stored at release time. One person described the problem as an impedance mismatch, because Git uses a hash value as an authoritative version, and Maven uses a natural sequential number. I suppose that, most logically, releasing a version in Maven would cause a branch to be created in Git, but that would still mean that after every release, you would have to clone another branch on your machine to continue working on the latest version.

I see the advantages to a distributed system like Git, but in order to not allow sharing to become a free-for-all, some control must be established. I see that control in the Maven release process, and control means limiting the full capabilities temporarily and establishing a sharing policy. Pulling together all the changes that need to go into a release, making sure everyone is in sync, releasing, and then allowing people to become asynchronous again.

The point in having a continuous integration build is to pull together what everyone has shared so-far, and making sure they work together. The more often that happens, the faster conflicts are found and the cheaper they are to fix. That flies in the face of allowing developers complete asynchronous activity. Again, periodic sharing is important to maintaining stability. So, really, the advantage of a distributed system isn't that it IS distributed, it's that it CAN BE distributed. It allows a more natural period to be established for sharing, but sharing still must take place, and it must be controlled with a consistent policy. Assuming everyone is working towards the same goal, the next release, the sharing policy must reflect that.

Technorama

Sunday, August 7, 2011

A new paradigm to conquer