Wednesday, February 27, 2013

Evaluation and comparison of VCS

Hello,

today I want to take a closer lock on existing version control systems.

The following table shows a general overview of existing centralized and decentralized systems:


For further evaluation I chose three of the versioning systems in the table. The version control systems Subversion, Git and Mercurial have been picked, because they are currently the most popular systems and they are open source applications.

Apache Subversion (SVN)

The development on Apache Subversion started in October 2000 by CollabNet. In February 2010 it became an open source Apache Software Foundation Project. SVN is a centralized version control system. Collaboration with other developers, even in remote locations, is possible since SVN uses HTTP. HTTP (Hypertext Transfer Protocol) is a standard protocol which is allowed by most firewalls [7].

SVN offers a lot of features like for example a merging tool, branching support, commit messages and a whole lot more. It tries to solve conflicts if two developers have being working on the same place in a file. SVN features also true atomic commits. That means that either a whole commit completes or nothing is committed. That helps repositories to not become corrupted due to incomplete data [7].

Because SVN is open source, easy to learn and offers a lot of features, that other version control systems in the past didn't had, it found a wide adaption by a large number of companies. That results in a wide support with third-party applications. Nevertheless, Subversion suffers from the disadvantages centralized file systems have. If using a slow internet connection the speed of updating or committing data goes down rapidly. Also SVN merging abilities suffer if a file is not cleared of the additional code generated [7].

Mercurial

After the free version of Bit Keeper was removed from the market Mercurial was started to be developed in April 2005 by Matt Mackall. Mercurial is open source and a decentralized versioning system including all advantages and disadvantages of that kind of systems. For example changes are most of the time just pushed to the local repository which gives a huge speed increase. When pushing to a remote location that can be set up with SSH. SSH (Secure Shell) is very similar to the standard HTTP protocol but more secure. That can be an advantage if all HTTP ports are closed in a locked-down network for whatever reason [7].

Mercurial is programmed in Python, which ensures good cross-platform compatibility.  It is mostly a command line tool but there are also graphical implementations available. Mercurial offers some nice features like for example changes can be exported to a file. Another user can import that file to a remote repository still under the original name of the first user. This can be useful if new code has to be reviewed and approved of other team members before committing [7].

Because of all that features Mercurial found a wide number of users like for examples the companies Mozilla, Netbeans or Growl [7].

Git

Git started to be developed nearly the same time like Mercurial. Linus Torvalds, the inventor of the Linux kernel, programmed the first version of Git in just 4 days. Git was developed for managing the source code for the Linux kernel development with two core ambitions: speed and security. It is a decentralized version control system [7].

Git takes a special focus on rapid branching. In Git it is possible to make separate branches (so called Git stashes) for special features that can be merged back in the repository after they have been finished. Git is also very scalable. Even if managing a huge project with it doesn't slow it down [7].

The local use of Git is quite impressing, but there are also disadvantages like all decentralized version control systems have. In addition, the set up and learning curve of Git is not as easy like with other systems. The communication with a remote Git repository requires having SSH keys for the local and the remote machine.  Nevertheless, there are a lot of books and online resources available for getting to know this versioning system [7].

Selection of two suitable systems for implementation

For the next part of my work, the implementation of version control systems for the processing of realtime geo data, I decided to use Subversion and Git.

These two systems have been chosen to have a comparison between centralized and decentralized systems. They are two very popular systems which are used in many companies. Git won over Mercurial because it is faster and offers more (branching) features [7].

[1] Apache Software Foundation. Apache Subversion. http://subversion.apache.org/,
March 2013.
[2] Perforce Software Inc. Perforce. http://www.perforce.com/, March 2013.
[3] Microsoft Corporation. Microsoft. http://www.microsoft.com/, March 2013.
[4] Pearce Shawn Scott Chacon, Hamano Junio C. Pro Git. Apress, New York, 2009.
[5] Matt Mackall. Mercurial. http://mercurial.selenic.com/, March 2013.
[6] Canonical Ltd. Bazaar. http://bazaar.canonical.com/en/, March 2013.
[7] Chris Kemper and Ian Oxley. Foundation Version Control for Web Developers.
friendsofED, New York, 2012.

4 comments:

  1. Great - just a note on the english in the first line "today I won't to take a closer lock on existing version control systems." should be "today I want to take a closer look at existing version control systems"

    ReplyDelete
  2. Oh, thanks much - I corrected the spelling mistake!

    ReplyDelete
  3. just a small comment on your references: if you are citing books please also add the locations/city. Furthermore, if you refere to web references please provide the url and access data as well. Also when considering journals and proceedings be careful not to forget the volume and issue information. Sorry that I am a little picky on this but you will save you a lot of work for your thesis later on.

    ReplyDelete
  4. Thanks for the hint!
    I am working with Jabref and I copied the references out of my latech-document.
    I got most of my references from "scholar.google.com". When you click on "cite" you get a bibtex-code from there. But they don’t reference the cities. I added them now manually.
    Jabref has no field for "access date", just month and year. I use these fields now.
    The latech template doesn’t print the field "URL". I copied the link now to the field "Howpublished" ...
    I got a lot of links from Sam of good blogs. Am I allowed to reference a blog? I am not sure about that...
    The content of the SaaS-videos from Berkley I am watching is also quite interesting. Am I allowed to cite something from the Berkley videos?

    ReplyDelete