Migration from SVN to Git! a developer guide to git-svn

Lately I've been working with many customers to help them make the migration from the old SVN to git. This obviously not a one shot move and does require time, discipline and learning. I pretty sure that there are some readers wondering how SVN is still used, and the answer is YES, SVN is still here and many companies still using it.

Previously I've wrote a post on how to migrate your code base from SVN to gitlab. This was of course the easy part, helping your team to get used to git, putting a migration strategy that suits your needs without impacting (or with the less impact) your delivery dates is much more crucial.

Throughout my journey with SVN-TO-GIT, one tool helped me a lot and was truly my ultimate weapon: git-svn command. In a nutshell, git-svn is a git command that allows using git to interact with Subversion repositories. git-svn is part of git, meaning that is NOT a plugin but actually bundled with your git installation.

From SVN to Git Repositories

First of all, you need to create a new local copy of the repository with the command

git svn clone SVN_REPO [DEST_DIR] -T TRUNK -t TAGS -b BRANCHES

If your SVN repository follows the standard layout (trunk, branches, tags folders) the above will look like:

git svn clone -s SVN_REPO [DEST_DIR]

git-svn clone checks out each SVN revision, one by one, and makes a git commit in your local repository in order to recreate the history. If the SVN repository has a lot of commits this will take a while, so you may want to grab a coffee.

When the command is finished you will have a full fledged git repository with a local branch called master that trackes the trunk branch in the SVN repository.

If the SVN repository has a long history, the git svn clone operation can crash or hang (you'll notice the hang because the progress will stall, just kill the process with CTRL-C). If this happens, worry not: the git repository has been created, but there is some SVN history yet to be retrieved from the server. To resume the operation, just change to the git repository folder and issue the command git svn fetch.

Pulling the latest changes from SVN Repo

Whenever you want to retrieve the latest changes from a git server you obviously use git pull. The equivalent to git pull in your git-svn journey is the command git svn rebase.

This retrieves all the changes from the SVN repository and applies them on top of your local commits in your current branch.

You can also use git svn fetch to retrieve the changes from the SVN repository but without applying them to your local branch.

Adding changes

now that you've pull the work from svn, use your local git repository as a normal git repo, with the normal git commands (git add, git commit, git stash ...)

Pushing local changes to SVN

git svn dcommit --rmdir will create a SVN commit for each of your local git commits. As with SVN, your local git history must be in sync with the latest changes in the SVN repository, so if the command fails, try performing a git svn rebase first.

Also note that your local git commits will be rewritten when using the command git svn dcommit. This command will add a text to the git commit's message referencing the SVN revision created in the SVN server, which is VERY useful. However, adding a new text requires modifying an existing commit's message which can't actually be done: git commits are immutable. The solution is to create a new commit with the same contents and the new message, but technically this is a new commit anyway (i.e. the git commit SHA1 will change)

What you should keep in mind

Keep the history linear

This means you can make all kind of crazy local operations: branches, removing/reordering/squashing commits, move the history around, delete commits, etc anything but merges.

Handling merges

Do not merge your local branches, if you need to reintegrate the history of local branches use git rebase instead.

When you perform a merge, a merge commit is created. The particular thing about merge commits is that they have two parents, and that makes the history non-linear. Non-linear history will confuse SVN in the case you "push" a merge commit to the repository! However don't worry: you won't break anything if you "push" a git merge commit to SVN.

If you do so, when the git merge commit is sent to the svn server it will contain all the changes of all commits for that merge, so you will lose the history of those commits, but not the changes in your code.

Handling empty folders

git does not recognize the concept of folders, it just works with files and their file paths. This means git does not track empty folders. SVN, however, does. Using git-svn means that, by default, any change you do involving empty folders with git will not be propagated to SVN.
Fortunately the --rmdir flag corrects this issue, and makes git remove an empty folder in SVN if you remove the last file inside of it. Unfortunately it does not removes existing empty folders, you need to do it manually

To avoid needing to issue the flag each time you do a dcommit, just issue the command:

git config --global svn.rmdir true

This changes your .gitconfig file and adds these lines:

[svn]
rmdir = true

Be careful if you issue the command git clean -d. That will remove all untracked files including folders that should be kept empty for SVN. If you need to generate aging the empty folders tracked by SVN use the command git svn mkdirs.

In practices this means that if you want to cleanup your workspace from untracked files and folders you should always use both commands:

git clean -fd && git svn mkdirs

Cloning big SVN repos

I've worked with many SVN repos, one had more than 500.000 commits. Worth nothing to mentions that cloning big SVN repositories with huge history could take hours to clone, as git-svn needs to rebuild the complete history of the SVN repo. Fortunately you only need to clone the SVN repo once; as with any other git repository you can just copy the repo folder to other collaborators. Copying the folder to multiple computers will be quicker that just cloning big SVN repos from scratch.

A note about commits and SHA1

As git commits created for git-svn are local, the SHA1 ids for git commits is only work locally. This means that you can't use a SHA1 to reference a commit for another person because the same commit will have a different SHA1 in each machine. You need to rely in svn revision number. The number is appended to the commit message when you push to the SVN server

You can use the SHA1 for local operations though (show/diff an specific commit, cherry-picks and resets, etc)