Using git to bridge two svn repositories: a success story

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I complain too much of the time on this list, so here's a success story I can share for a change. I just used git to merge two separate svn repositories: the official repo for an open-source program and an internal repo with our locally-modified version of the same program. The local copy has been tracking the official one off and on over time; it has a bunch of changes that were contributed back to the official code base at various points, other changes that weren't, and some directory layout changes to accommodate our internal build system.

We had fallen fairly far behind the official version, so yesterday I decided to bring us up to date. Not a trivial merge; various of our changes had been applied to different branches in the official svn repository, which had gotten merged back into their trunk at various points. In many cases local change A appeared before remote change B in our history but in the opposite order in the official repo since they committed our change after the other one.

Obviously svn is nowhere near adequate to the task of normalizing these two code bases. So I used git instead, and it worked out great. Specifically, here's what I did, minus a few false starts:

1. Made two git-svn repositories, one based on our local code base and one based on the official svn repository.

2. Created a git repository and pulled from both of the git-svn repos. (I know I could have done this with one repo instead of three, but I wanted to make sure I could easily blow away one of the parts of this and start over.)

3. Added a couple of .git/info/grafts entries for places where I knew the original project had merged branches back into trunk, but where git-svn hadn't detected the merge. Probably not git-svn's fault, given how brittle merging is in svn and the fact that a couple of the merges were split across multiple svn revisions.

4. Found an early point in our history when we had a fairly close to unmodified copy of the distribution at the time and created a branch from that revision.

5. Renamed the files from our layout back to the distribution's. (I'll talk more about this below.)

6. Did a baseless merge with the corresponding revision of the distribution's history. Resolved the conflicts, which weren't too severe thanks to step 4.

7. Walked through the revision history on both sides merging into my integration branch. I was more cautious about this than I probably needed to be (though more on that below too); my approach was to merge up to a particular change on our side that I knew we'd contributed upstream, then merge up to the corresponding revision on the official side, repeat until done. In cases where our stuff had been integrated into a branch in the official repo, I followed that branch rather than trunk for the most part. I ended up walking three branches plus trunk.

8. Once I had merged the last of our local changes, I merged the head of the official trunk into my integration branch, picking up a bunch of official revs in one step.

9. Renamed everything back to our naming conventions.

This was kind of an iterative process and the main reason I did it incrementally at first was mostly to limit the amount of conflict resolution at any one step, as well as to make sure that each of our contributions had in fact been merged correctly. (I wrote most of the code we contributed so I was able to quickly tell if it looked right.)

The gitk display for this repo looks like a ladder; nearly every revision of my integration branch is a merge.

Now, about those renames. The major change in structure was to rename the source directory from "server" in the official repository to "src" which our build system expects. So before I did any merges, I committed a revision where I did "git mv src server" (along with a couple other similar renames) so there'd be an explicit rename-only revision for git's rename detection to use to apply changes to the right files.

Unfortunately, that broke down as soon as I got to a contribution of ours that added a new file. I merged the contribution on our side (where everything lives in src/), and it correctly applied the modifications to the existing files in server/ thanks to the renames in the history. But the new files were created in src/. I didn't notice the file missing from server/ at first, and merged the revision from the official repo that created the same file there. The new file was identical on both sides, so I didn't think it was odd that there wasn't a conflict, and proceeded to the next rev. It was only after several more revisions merged from both sides that I noticed the server/ copy of the file was missing changes I'd sworn I'd just merged from our side. Naturally all our local changes were getting successfully applied to the copy in src/ and all the changes from the official repo were showing up in the server/ copy.

So I ended up resetting back to the first revision that created a new file in src/, and making sure I stopped at each revision that introduced a new file there so I could commit an extra revision after the merge to manually rename it into server/. Then the subsequent merge with the revision that created the file in server/ would correctly flag any differences between the two versions as conflicts, and I could go through and do the right thing with them. There were only three or four such cases so it wasn't too much extra work.

The only other glitch I ran into was missing one merge from the official svn repository when I created my grafts file. That caused me to get a bunch of repeat conflicts when I merged the subsequent svn trunk revision. But I immediately realized what was happening there; I reset, added the missing merge to my grafts file, and did the merge again, and the repeat conflicts went away.

Aside from those two minor things, it was a painless exercise, and now I have a reasonably coherent (if a bit convoluted) combined history of the two versions of the code base without the svn repositories on either side being aware of each other. I plan to keep all these git repositories around so I can quickly integrate subsequent changes from both sides.

So, kudos all around. Without git this would have been a much more time-consuming and error-prone exercise!

-Steve
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]