If you are human, you will occasionally forget where you put something. If you are in a hurry, you may not always finish what you start. If you are creative, you may not always remember … umm … uhhh … Oh, right … You may not always remember where you were going with an idea. Put these three together and you have a programmer. This post will describe how to stay in control of your code by using Subversion. If you write code and are unfamiliar with version control, you need to start using Subversion today!
Learn Systems, not Details
Libraries have lots of books but librarians can find every single one. Not because they remember the location of each and every book but because they have a system for finding them. Programmers are not that different. They need to be able to find lots of pieces of code, make backup versions as they work and retrieve older versions when something goes wrong. When sharing code with others there is also the problem of keeping all versions of code in sync so that one person’s changes don’t get lost when another person works on the same piece of code. The number of details to keep track of grows with the size of the code base, the length of the project and the number of programmers involved. Nobody can remember all the details and a project of even modest size can quickly overwhelm any homegrown accounting system.
The task of managing all these details, known as configuration management, is so central to programming in teams that numerous tools and best practices have emerged to make it much, much easier than it used to be. The most mature and widely used of these tools is Apache Subversion — a revision control system that is at the heart of many open source projects.
Along with the Subversion software there are a set of best practices that, when followed, all but guarantee that your code will be safe, findable and well versioned. Many of these best practices are spelled out in the open source (= free!) book: Version Control with Subversion. Even the smallest effort to learn and use Subversion will pay off by preventing those day-long flails trying to recover from broken or deleted code. In the rest of this post we will distill the most important best practices for working in small groups (1-10). No guru-like system administration capabilities are assumed, just a desire to create a friendly, efficient and well-documented work environment.
A Subversion server is required before one can begin using Subversion. Binary distributions for most operating systems can be downloaded from subversion.apache.org. For many small groups without serious system managers it is often easier to subscribe to a Subversion hosting service like projectlocker.com.
A Subversion client is needed on each developer’s machine. Most *nix distributions come with the ‘svn’ command-line client installed. Windows users can install TortoiseSVN.
- This datastore is the “lock box” where Subversion stores the current and all previous versions of your code
- revision number
- Subversion increments the revision number every time code is checked in. It is essentially a unique ID for each transaction.
- version number
- Version numbers are not part of Subversion. They are assigned by programmers according to their understanding of software versioning.
- working copy
- The working copy is whatever revision of the code a user has in their directory. This may or may not be in sync with the code in the Subversion repository.
- The trunk/ directory in a Subversion repository should contain the latest, greatest, working version of the code and nothing else.
- The tags/ directory is for tagging versions of code before embarking on significant code changes.
- The branches/ directory is for those times when you will be making large, potentially damaging changes to the code base. It’s best to keep these separate until completed when they can be merged with the trunk/.
10 Habits of Successful Subversion Users
The following is our list of priorities for those who are just beginning to learn how to use Subversion. Our bias is to keep things clean and simple and to not take advantage of advanced features that require expert knowledge. Anyone who deals with files that need versioning should be able to follow this advice whether they are a programmer or not.
1) Use a separate subversion repository for each project
Nothing is scarier than seeing the desktop of someone’s computer covered with hundreds of unorganized files. Organized people put files in folders and then put those folders into other folders. Working with Subversion is no different. You should create a Subversion repository for each full scale project you are working on, i.e. each self-contained collection of code. This helps in a variety of ways, especially the use of version numbers. Below these top level repositories you will import the code for the components that make up your project.
A typical URL in for the trunk of a project might look like this:
At ProjectLocker they appear as:
Locally, you should check out a working copy of PROJECT as seen in the following example:
svn co https://my.host/svn/PROJECT/trunk ./PROJECT
Hopefully your code is broken up into different functional components (modular design) and within each PROJECT you can have separate COMPONENT directories. The URL for a single component will thus be:
2) Use conceptual names for components
Naming is very important in software projects, especially as they increase in size and complexity. The choice to use Subversion is also a choice to maintain code through time and across multiple developers. That makes choosing good names that much more important. When picking names for components it is important to choose names that will make sense to someone not familiar with the code. (Perhaps even yourself a year or two later.) A good name will describe the component’s fundamental functionality. Names to be avoided are those that contain version numbers or other information that seems appropriate in the flail-of-the-moment but do not describe functionality.
“GrowthSimulator” would be a good component name while “GS_fortran_v0.2” is an ad hoc name that should be used only in tags (see below).
3) Create tags before big changes
Before they find out about version control, most people create backup copies of older versions of code by sticking files in directories with distinguishing names like “~-1” or “~.2011-02-02” or “~-b4-X”. These ad hoc names are meant to capture relevant information or perhaps trigger a memory of what was in that version of the code. By keeping these old directories around one is able to access older versions to look for any bits of code found there for a better understanding of how things have evolved.
This is a very reasonable thing to do. Whenever you are about to embark on major code changes you should back up the current, up-to-date code in an appropriately named directory. In Subversion this is called “tagging” the code. The difference with Subversion is that you don’t store the tagged copy of the code in the trunk/ directory. The trunk directory should contain only the latest version of the code. Instead, you create a tagged copy of trunk/ in the tags/ directory like this:
svn -u status # to make sure everything is up-to-date svn cp https://.../PROJECT/trunk https://.../PROJECT/tags/ad-hoc-name
The ad-hoc-name should be something that has meaning. Our preference is to include any version number along with a brief description such as “v-0.0.2.1-b4-XML-support”. A more detailed description can be given in the message associated with this action.
4) Use version numbers
Version numbers are super important if your code will be used by people other than yourself. Trying to debug someone else’s problem with your code is a nightmare if you don’t know what version of the code they are running. We take a simple approach to versioning:
- use a standard 4-digit code for software versioning
- include the version number inside your source files and provide a method for end-users to determine the version
- include the version number when you create tags
For small groups that should be enough to keep you out of trouble as long as you remember to update the version number for each release and tag. (Advanced users will want to make use of Subversion’s special keywords to automatically embed revision numbers in source code but this requires that you first understand Subversion properties and is beyond the level of this post.)
5) Test your code before checking it in
In an ideal world, only working code would be checked in to Subversion. Yes, sometimes it is important to save your unfinished work and solo programmers can do this with impunity. Group programming, however, demands that the trunk/ always contain working code. We highly recommend becoming familiar with unit testing frameworks as used in the Java and Python communities. You may have some other technique for testing your code but whatever it is it should be automated. That will allow you to test much more often.
It is imperative that you test any code you’ve been working on before you check it in unless you’re working entirely by yourself and even then it’s a very good idea. (If you are doing real violence to the existing code and will need to check in broken versions on the way to success then you need to learn about branching and merging — also beyond the scope of this post.)
6) Check in often
It’s not just your mom who wants you to check in often. Subversion wants you to as well. We’ve all had the experience of accidentally losing the changes we just worked on for the last hour. (Or day!) Few things in life are as frustrating. That’s why you should check in your code as often as you can. If you’ve done a good job of compartmentalizing your code and writing automated unit tests you should be able to check in working code several times a day.
If you’re working in a group you will also want to make sure to check the status of the files you are about to check in with
svn status. (Has someone else edited this file since you last checked in?)
7) Make good use of messages
Every Subversion action that modifies the repository will request that you include a message describing that action. This is not the time to hit Carriage-Return and get back to work. Describing the code changes you are committing is your work. Yes, it’s OK to just say “typo” or “tweak” if it’s a tiny change. But any changes of substance should have a descriptive message that explains what was changed and why.
If you have Trac set up (see below) we have two extra suggestions:
- Make sure the first line of your message is short and descriptive. The first ~72 characters of your message will appear in the Trac Roadmap.
- Any use of “#ticket_number” in your message will be converted by Trac into a link to the identified trac ticket. This method of linking revision numbers with the Trac tickets they address is extremely useful later on when trying to figure out how the code changes for feature enhancement A introduced bug X .
8) Check in groups of files
Subversion assigns the same revision number to all files checked in at the same time. This has the desirable affect of grouping together all the files with code changes that were made to enable some new functionality or swat some old bug. When doing a big checkin, as opposed to a “typo” correction, be sure to include all the files involved and provide a detailed message. (Trac will group these together as a changeset.)
9) Use Trac
Trac is a python based issue tracking system for software development that is tightly integrated with Subversion. Subversion hosting providers like projectlocker.com include Trac for every Subversion repository. Any site that provides Subversion hosting should do the same.
Trac’s issue tracking capabilities are beyond the scope of this post but Trac also provides an excellent interface for perusing Subversion repositories including:
- access to the entire Subversion repository (i.e. all trunk/ tags/ and branches/ directories)
- syntax highlighted display of source code
- full revision logs
- differencing between any two revisions
- … and lots more …
10) Read the documentation
As with any piece of advanced software, it pays to Read The Friendly Manual. Especially when that manual is as good as Version Control with Subversion. Command line users of Subversion also have access to internal help docs with
When you start using Subversion it is important to understand the version control concepts upon which Subversion is based. Not surprisingly, the amount of value you derive from Subversion will be commensurate with the amount of time you spend learning how to use it well.
Good Luck in harnessing this awesome tool!
Subversion vs. Git
The latest shiny, new cool version control system is git, much favored by the linux and python communities. If you are wondering whether to use subversion or git you should first read the following two items:
The first link has perhaps the best summary:
Git is not better or worse, it’s just different. If you have the need for “Offline Source Control” and the willingness to spend some extra time learning it, it’s fantastic. But if you have a strictly centralized Source Control and/or are struggling to introduce Source Control in the first place because your co-workers are not interested, then the simplicity and excellent tooling (at least on Windows) of SVN shine.
Here are few more choice quotes:
SVN has the advantage that it’s MUCH simpler to learn: There is your repository, all changes go towards it. If you know how to create, commit and checkout you’re ready to go and you can pickup stuff like branching, update etc. later on.
Git has the advantage that it’s MUCH better suited if some developers are not always connected to the master repository. Also, it’s much faster than SVN. And from what I hear, branching and merging support is a lot better (which is to be expected, as these are the core reasons it was written).
SVN is extremely simple to use and evangelized everyone on the “mainline development model”. Error-prone (break the build!) on non-toy projects, it helped developed techniques like “continuous integration” as a way to “avoid integrations”. While the idea is good, most of the surrounding concepts were clearly limited by the tool itself.
I have found that a well followed and defined process established for how a development team functions overcomes the branching/merging issues. Yes, if you have no plan and it is a free-for-all, conflict resolution in SVN when merging would be a nightmare. So, if your development model/team is not mature in its process of how things are done, then yes, I can see how you would have a poor experience with SVN. But, pound for pound, it is an incredible tool that is very capable for handling the integrated and concurrent development models of a large enterprise.