----- Original Message ----- > From: "Andrew Sayers" <andrew-git@xxxxxxxxxxxxxxx> > Sent: Monday, March 5, 2012 6:27:32 PM > Subject: Re: Approaches to SVN to Git conversion > > > My current thinking (and this is very much open for discussion) is > > that as long as the SVN properties are available (especially the > > copyfrom information) Git has just as much information (if not more) > > to reconstruct the SVN history as SVN does. (And going through our > > messy history I haven't found any counterpoint to this yet) > > I agree that git can be taught a superset of the information in SVN, > but you'll need absolutely all SVN properties available... I'm pretty sure Jonathan won't be happy with anything less ;) > I wrote my SVN exporter based on SVN dumps for three reasons - I > figured people switching from SVN would be more comfortable > customising a solution that only used technologies they understood, I > figured it might be useful to Mercurial or Bazaar some day if it was > DVCS-neutral, and I have to use SVN for my day job so I'm more > interested in getting a good migration story today than a great one > tomorrow. The multiple systems argument is a good one. > > my %branch_spec = { '/trunk/projname' => 'master', > > '/branches/*/projname' => '/refs/heads/*' }; > > my %tag_spec = { '/tags/*/projname' => '/refs/tags/*' }; > > > > Now I know this simple mapping will fail as I get further in our > > history -- in particular we have one branch that came from: > > > > svn cp $SVN_REPO/trunk/ $SVN_REPO/foo # OOPS! not in branches! > > svn mv $SVN_REPO/foo $SVN_REPO/branches/foo > > > > It's then up to the user to modify the branch > > map to something that accounts for this behavior: > > > > my %branch_spec = { '/trunk/projname' => 'master', > > '/branches/*/projname' => '/refs/heads/*', > > '/foo' => '/refs/heads/foo' }; > > my %tag_spec = { '/tags/*/projname' => '/refs/tags/*' }; > > I started with an approach like you describe, but as you say it winds > up in a mess of special cases. A friend pointed me to Perl's catalyst > repository[2], which is a wonderful haven of every mad SVN thing ever > dreamt up. That got me playing with more general heuristics, and > while writing this e-mail I think I've finally nailed it. What do you > say to defining SVN branches like this: > > A directory is a branch if... > 1. it is not a subdirectory of an existing branch; and > 2. either: > 2a. it is in a list of branches specified by the user, or > 2b. it is copied from a (subdirectory of a) branch I think I started with a very similar set of rules... Looking at my code now I'm having a hard time summarizing them (probably because they evolved with the code, so what started simple morphed into something pretty complicated). I guess as long as the user has the option to say "no, don't treat this copy as a branch" (or equivalently the Git side of things has a way to say "ignore this branch") these rules would be okay. But at that point we're back to a list of exceptions -- really we're arguing white-list vs black-list... I eventually chose to go the white-list route for our conversion after starting with black-list (a white-list that still required a few manual edits before manipulating the Git history). So take that single data point for what it's worth. > > > Once the format is defined, git import is fairly straightforward. > > > Proof-of-concept code to follow, but it's really just a wrapper > > > around git-commit-tree, git-mktag etc. I wrote this in Perl > > > thinking it would relate somehow to git-svn, but eventually > > > realised it didn't and that a few hundred calls to (plumbing) > > > processes per second isn't so good for performance. The only > > > interesting part of the problem is how to tackle SVN tags. I went > > > for an ambitious approach, making normal tags where possible and > > > downgrading them to lightweight tags when necessary. This does > > > involve managing something that is effectively a branch in > > > refs/tags/, but what else is an SVN tag but a branch in the wrong > > > namespace? > > > > I don't understand how "normal" and "lightweight" apply in this > > situation? ... In the case of actual content changes in a tag's > > life, I think it's up to the user to decide between three options: > > > > 1) only retain the last SVN tag > > 2) tag using the git-svn-style 'tagname@rev' for all but the last > > 3) Do (2), but move older tags to some hidden namespace > > (refs/hidden/tags or the like) > > > > ... In the bidirectional case things get murky (maybe always tag > > with tagname@rev and hope for tab completion?). > > I didn't explain this particularly well, as it's based largely on the > vague desire to make update work some day. Imagine the user does > this: > > * git svn-pull # get tags/foo, a candidate for an annotated tag > ... time passes ... > * git svn-pull # tags/foo has now been updated in another revision > > If we create an annotated tag in step 1, what do we do in step 2? You > can't make the tag object the parent of a new revision, so you need to > do something unpleasant. The solution I proposed was to convert the > tag message to a commit message (i.e. pretend a lightweight tag had > been created all along), then add another commit on top of it and make > a lightweight tag from the new commit (i.e. treat it like a branch). > In retrospect that's far too much magic without user involvement - a > better solution would be to give the user this option along with the > ones you outlined, and let git-config remember their preference if > they want. Okay, that's what I thought you meant (and what I classified as a bidirectional problem, but I guess it's not strictly a bidirectional problem, but a one-time migration does not have the problem). If you want to continue to update Git from SVN there are two cases to consider: 1) Each Git repository *only* talks to SVN 2) The Git repository is cloned for further use (So the chain is something like SVN->Git->Git) In (1) your lightweight tag solution is probably okay (but I'm pretty sure creating/deleting annotated tags would behave the same way because no one else sees the Git tag object). In (2) I think there would still be a tag conflict when the upstream Git repo replaces a lightweight tag and the downstream repo attempts to fetch it. I don't know what the fetch/pull machinery does when there's a lightweight tag conflict (I'm guessing either bails out or keeps the local one?). Case (2) motivates me to say always generate (annotated?) tags named tagname@rev so there can be no conflicts. In that case the only difference I see is if we create an empty Git commit with the tag message plus a lightweight tag or tag the original commit with an annotated tag (I think it's fairly obvious I'm a fan of the latter). > [1] http://en.wikipedia.org/wiki/Full_employment_theorem > [2] http://dev.catalyst.perl.org/repos/bast/ Thanks, Stephen -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html