From AccuRev to git
Version control systems are really important tools in the day to day life of a software developer. What happens though when you have to move both code and people off of one system to another? Let's just say that you'll need time, patience and the desire to write your own tools...
As you know, I'm a big fan of git, so when work decided to move our source control from AccuRev to git I was one of the first ones to jump up in excitement. Don't get me wrong, AccuRev isn't a bad tool per se, but it does have enough downsides which make it a pain to work with, especially if you're used to something more lean such as git.
My main issues with AccuRev are mainly the following:
- it's slow as hell, especially the GUI
- command line options aren't as good compared to git
- user hooks are a joke compared to git, which in turn leads to
- harder to ensure coding style and local tests by running scripts before committing/pushing code
- harder to integrate correctly with continuous integration systems
- getting history is a pain in the ass: for example it's next to impossible from the output of
accurev histto figure out if a file was added or deleted
- not at all conductive to agile development: branches (streams) can't be re-used if deleted, workspaces need to be manually moved to track another stream, which usually leads to "one workspace per stream syndrome"
- not portable: workspaces are hardcoded to the system (they have the PC name in their metadata); this means that if you'll switch PC's you'll need to either manually trawl through your list of workspaces and update them one by one, or try and script it (which doesn't work if you're switching from Windows to Linux and vice-versa).
- still on the workspace part, if you work on multiple PCs you'll end up having more than one workspace on a given stream, because workspaces can't be reused (seems like nothing in AccuRev can be reused).
All of the issues from above cascade and make the developer behave in a certain way in order to accommodate the tool.
The best example to give in order to show the change of mentality needed to move from AccuRev to git is caused by issues number 7 in my list. Because it's hard to make a workspace track another stream (equivalent to a git branch) users started creating separate folders for each stream; this in turn allowed them to simply diff the folders in order to check the differences between the development and stable streams for example. When moving to git, the first question I had was what happens when they change the branch they're on and how to see the differences between branches.
Another bad behavior was caused by how slow AccuRev can be when it comes to branching: in order to reduce the time spent waiting around, most users pushed changes directly to the development (or worse, master) branch instead of having feature branches; this then made it hard to push just specific changes to the master branch when you wouldn't want to integrate all changes, or one of them needed to wait for another fix.
Most of these workflow differences got solved via trainings as well as due to git's popularity, which means that most questions are just a Google search away.
The next step in making the move was actually getting the code into git. Some teams decided that history wasn't important for them, so they'll just dump all of their existing code on the stable branch and then go from there. For us though that was unacceptable, so I decided to write something that would migrate our AccuRev history to git.
Before going into the code let's talk a bit about some AccuRev terms, and compare them to git.
AccuRev has the concept of depots, which should map to git repositories, but in our workplace we usually assigned them per team, so they ended up holding completely different components and systems. This way, depots are better mapped to projects in our specific case.
Next are the streams, which are the branches from git. Streams can optionally have a parent, which is the equivalent to git branching, or they can start from scratch, which is how we translated repositories.
Basically we would have the following AccuRev structure:
TEAM_DEPOT -> Wizard -> Wizard_stable -> Wizard_develop TEAM_DEPOT -> Toolchain -> Toolchain_stable -> Toolchain_develop
In the TEAM_DEPOT we'd have two components, Wizard and Toolchain, which start from the parent (empty) stream of
Toolchain, which in turn has the child of
Wizard/Toolchain_stable on which actual code resides.
If we are to map this to git, we'd have the following:
Wizard repository -> master branch -> develop branch Toolchain repo -> master branch -> develop branch
Instead of having a single repository which contains both projects each of them is split into it's own repo, with it's own history, permissions and so on.
In order to migrate our code we had to do the following steps:
- select an AccuRev stream to migrate (this was usually the stable stream for each particular project)
- get the full history for that stream
- for each historical event get the author, message and timestamp
- for each historical event get the actual files and commit them to a git repo with the info obtained at the previous step
- keep doing 3 and 4 until you're up to date
Optionally, because for a period of time both AccuRev and git would be available, with development being done in AccuRev, allow updating a git repository with the new history from AccuRev:
- get the stream history
- check the latest commit from git and map it to AccuRev history
- start migrating from the newest transaction in AccuRev not available in git
The first issue we ran across was due to workspaces and how they're attached to streams: we needed to check if the location used was already associated with an existing workspace, as that would prevent us from creating the new workspace.
The second one was having to move the migration workspace once created in case the user had to perform multiple migrations on different streams in the depot.
Then we found out that AccuRev doesn't really sanitize the messages in any way, which can in turn lead to failures when trying to commit the changes into git.
Another strange case is that, while AccuRev insists that all streams start with the depot name, that match isn't case sensitive, so you could have a depot named
Project and the stream could start with
project, which in turn caused my script to fail.
But the worst thing is that you can't tell if a file is deleted or added from the AccuRev history file. This in turn lead to my first implementation being slow for large repositories, because I would have to get all the files for each history step (not just the changed ones), copy the files to the local git repository, commit them, then deleting all the files in the git repo before copying the next round of files from AccuRev in order to detect deleted files.
To fix this we decided to move to a stream and workspace implementation, where we would have a pass-through stream tied to the one we would migrate and a workspace tied to this stream that points locally to git repo folder. Pass-through streams are interesting, as they allow you to change the history element they're pointing at without modifying the original stream; what this means is that, by having an AccuRev workspace tied to it, we wouldn't have to get all the files each time, but just the changed ones by simply modifying the transaction at which the pass-through stream was pointing and then updating our workspace.
Get it while it's hot!
In case anyone has to go through this themselves I've made the code available on Github and GitLab, and each and every contribution is appreciated.
Unfortunately the tests can't be made available because you'd need a reference depot which isn't portable. You can see the Making private tests public article for more information about this.