Painless migration from svn to git (Gitlab)

Everyone of us was (or know someone that) part of an enterprise still working with svn. Sooner or later, this company (will) make the decision to start using git! That was the case for me lately with a client. worth nothing to mention that this post does not try to go over the endless discussion git over svn, but I'm just trying to share with you a painless way to migrate your existing svn repositories to Gitlab.

1- Get the list of author names

In Subversion, each person committing has a user on the system who is recorded in the commit information. What we need to do first is create a mapping from the Subversion users to the Git authors. this below command generates the log output from svn in XML format, then keeps only the lines with author information, discards duplicates, strips out the XML tags. Then, redirect that output into your authors.txt file so we can add the equivalent Git user data next to each entry.

$ svn log --xml --quiet | grep author | sort -u | \ perl -pe 's/.*>(.*?)<.*/$1 = /' > authors.txt

the authors.xml file should look something like this:

aboullaite = Aboullaite Mohammed <mohammed@example.com>
someone = Another user <someone@example.com>

2- Migrating svn server repos to Gitlab

Nothing better than writing scripts that get the job done for you, specially for boring, repetitive tasks! Hence, I wrote a small shell script to automate this part (because manual migration sucks ;))

  #!/bin/bash
  ## your svn repos go here
  REPOS="REPO1 REPO2 REPO3 ..."

  for repo in $REPOS;
  do
     ## Get repos from SVN
     git svn clone svn+ssh://<user>@<url>/<go_to_repo>/$repo  --authors-file=authors.txt --no-metadata --prefix "" -s $repo
     cd $repo
     for t in $(git for-each-ref --format='%(refname:short)' refs/remotes/tags); do git tag ${t/tags\//} $t && git branch -D -r $t; done
     for b in $(git for-each-ref --format='%(refname:short)' refs/remotes); do git branch $b refs/remotes/$b && git branch -D -r $b; done
     for p in $(git for-each-ref --format='%(refname:short)' | grep @); do git branch -D $p; done
     git branch -d trunk
     ## Create repo in gitlab, add remote URL and push data to it
     curl --header "PRIVATE-TOKEN: <PRIVATE-TOKEN>" -X POST "http://<GITLAB_URL>/api/v3/projects?name=$repo&namespace_id=<namespace_id>"  
     ## You can remove this line
     ## My SVN repos were uppercase and gitlab create them in lowercase
     repo1=$(tr '[:upper:]' '[:lower:]' <<< "$repo")
     git remote remove origin
     git remote add origin git@<<GITLAB_URL>>:<GROUP>/$repo1.git
     git push origin --all
     git push origin --tags
     cd ..
  done;

First, we import repos from svn server using git svn clone. I used the authors.txt file with git svn to help it map the author data more accurately. We also tell git svn to not include the metadata that Subversion normally imports, by passing --no-metadata to the clone command :

 git svn clone svn+ssh://<user>@<url>/<go_to_repo>/$repo  --authors-file=authors.txt --no-metadata --prefix "" -s $repo 

After that, I do some post-import cleanup! First I move the tags so they’re actual tags rather than strange remote branches, and then I move the rest of the branches so they’re local.

for t in $(git for-each-ref --format='%(refname:short)' refs/remotes/tags); do git tag ${t/tags\//} $t && git branch -D -r $t; done

Next, I move the rest of the references under refs/remotes to be local branches:

for b in $(git for-each-ref --format='%(refname:short)' refs/remotes); do git branch $b refs/remotes/$b && git branch -D -r $b; done

Finally, I remove peg-revisions

for p in $(git for-each-ref --format='%(refname:short)' | grep @); do git branch -D $p; done

Now all the old branches are real Git branches and all the old tags are real Git tags. There’s one last thing to clean up: Unfortunately, git svn creates an extra branch named trunk, which maps to Subversion’s default branch, but the trunk ref points to the same place as master. It's an extra branch, so I removed it ;)

 git branch -d trunk

Now that I've real git repos, all what I 've to do is to create repos for them in gitlab and push my code in it:

curl --header "PRIVATE-TOKEN: <PRIVATE-TOKEN>" -X POST "http://<GITLAB_URL>/api/v3/projects?name=$repo&namespace_id=<GROUP_ID>"

The above command create a repo, and the namespace_id is the id of the namespace, where namespace is your username or an organization you want to create repos in.

The last thing to do is add your new Git server as a remote and push to it:

     git remote add origin git@<<GITLAB_URL>>:<GROUP>/$repo1.git
     git push origin --all
     git push origin --tags

You can find the complete script on github ;)