By: Arman Danesh
Page 1 of 2
This is the first of a new series which will look at the class of software designed to perform source code management (SCM). Through this series, we will look at the basic concept behind this type of software, and why it is an important tool for any Web developer. We will look at how revision control software affects the workflow of individual or team development, and we will look at several specific SCM packages. We will also take one specific package, Subversion, and look at how to use it in a practical scenario.
This first article will provide an overview of the theory behind SCM software, and why you want to use it, and a quick overview of the main SCM packages available.
After this article, the next article will walk through how SCM would be used in your workflow, and then later articles will move on to looking at specific SCM packages with a single article covering each specific product, including how they might be integrated into your Macromedia MX development workflow. The products that will be covered in some depth include CVS, Subversion and using ColdFusion's RDS as a SCM system.
No specific technical background is required for this article. Even beginner developers can benefit from SCM, and hopefully one of the packages introduced in this series will come to help you manage your code and other development files.
What is SCM?
Source code management systems are a common feature of large software development environments. They are used by both commercial and open source projects. It is far less common, however, to see SCM used in Web development, although larger development firms and projects do use SCM to manage their code.
SCM solutions are based on a simple principle: the authoritative copies of your source code, and other project files, are kept in a central repository. Developers will check out copies of files from the repository, work on those copies, and then check them back in to the repository. This is where SCM becomes an important tool; SCM manages and tracks revisions by multiple developers against a single master repository and provides:
- Locking and concurrency management
- Versioning and revision history
- Project forking
Locking and Concurrency Management
If you have ever worked in a team-based development environment that didn't use an SCM solution, you have probably encountered examples of the concurrency problem and its implications. Concurrency refers to the simultaneous editing of a file by more than one developer. This creates a contention problem which can lead to loss of revisions by one or more developers, especially if they are editing a single master copy of a file.
Consider a simple example: developers A and B both need to make changes in a file at the same time:
- Developer A opens the file.
- Developer B opens the file.
- Developer A changes the file and saves it.
- Developer B changes the file and saves it overwriting A's changes.
Clearly this has the potential for serious loss of work. Even if individual developers work on their own copies of files instead of a master set of files, after developers A and B make their changes, those independent changes to the same file must, somehow, be reconciled and then distributed out to all developers.
SCM systems manage the concurrency problem with file locking which makes it possible for files to be flagged as "in use" when a developer is editing them. Two main approaches exist to file locking: exclusive locks and unreserved locks.
With exclusive locking, the SCM prevents more than one developer from ever checking out a file to edit it. If a developer checks out a file for editing, all other developers are prevented from checking out the file; they will be able to view the file or get a copy (as opposed to checking it out) but they can't edit the master repository copy until the current developer checks it back in and, in the process, releases their exclusive lock on the file.
This solution can provide a foolproof way of preventing simultaneous editing but comes with its own problem: what happens when Developer A checks a file out and forgets they have the file checked out and leaves the office? When Developer B has an urgent change to make to the file they can't and would have to wait for Developer A to return to check the file back in. In a large development environment it's a challenging problem of human management and communication, particularly in a distributed development environment common in web development spanning multiple time zones.
Because of the problems described with exclusive locking, most major SCM systems in widespread use adopt a different type of locking: unreserved locking. In this model, multiple developers can check a file out and obtain a non-exclusive lock. Multiple developers then edit the file as needed.
The SCM system then implements mechanisms and algorithms to manage the merging of changes as files are checked back in to the repository. These algorithms range from the simple (inform developers of conflicting changes and ask the developers to resolve the changes) to advanced (attempt to determine and combine changes intelligently and ask for developer intervention or confirmation only when needed).
At first glance, it may seem like this does not offer much more than not using an SCM at all, especially for working on a shared set of files. But, this isn't the case. The SCM system knows who has checked out copies of files and prevents file overwriting by ensuring some type of manual or automatic merging of changes occurs. Combined with other SCM features discussed in the following sections, this makes an unreserved SCM system a powerful development management tool.
Versioning and Revision History
SCM systems not only handle editing by multiple developers and merging of changes when conflicts arise, they also implement versioning. Under versioning, a complete history of revisions of files in the repository is maintained. Every time a version of a file is checked back in to the repository, a copy of that version is archived. At any time, it is possible to pull back a previous version of a file, or roll-back the current version to any earlier revision.
Versioning systems also generate log reports of who checked in changes and when, as well as storing comments from developers about the changes they are committing back to the repository. Some systems can even show the specific changes made or each new version of a file that is checked in.
In some SCM models, individual files are checked in and out of the repository. In other SCM systems, a synchronization system is essentially built-in. Developers check out their own, complete, copy of the repository and work on files as they need, committing their changes back to the master repository. Developers can periodically update their personal copies of the repository to obtain new changes submitted by other developers.
This way, the online access to the repository is not necessary for development to continue. Instead, developers can work off-line if needed, only connecting to the repository periodically to commit their changes, and update new changes from the repository to their own, local working copies.
Sometimes it is necessary to separate a project into two separate development streams during the course of the development cycle. These streams of development may reflect multiple versions of an application or project, or completely separate projects, that share the same base (the code developed before the separation occurs). This separate is known as forking and most SCM systems provide the ability to fork a repository and establish separate versioning, history and locking for the two forks of the project. Changes in one fork have no impact on the other fork.