Re: Approaches to SVN to Git conversion

Nathan Gray <n8gray@xxxxxxxxxx> · Wed, 7 Mar 2012 15:08:20 -0800

On Tue, Mar 6, 2012 at 2:34 PM, Andrew Sayers
<andrew-git@xxxxxxxxxxxxxxx> wrote:
[snip]
> On 06/03/12 19:29, Nathan Gray wrote:
> <snip>
>>
>> The problem of specifying and detecting branches is a major problem in
>> my upcoming conversion.  We've got toplevel trunk/branches/tags
>> directories but underneath "branches" it's a free-for-all:
>>
>> /branches/codenameA/{projectA,projectB,projectC}
>> /branches/codenameB   (actually a branch of projectA)
>> /branches/developers/joe/frobnicator-experiment (also a branch of projectA)
>>
>> Clearly there's no simple regex that's going to capture this, so I'm
>> reduced to listing every branch of projectA, which is tedious and
>> error-prone.  However, what *would* work fabulously well for me is
>> "marker file" detection.  Every copy of projectA has a certain file at
>> it's root.  Let's call it "markerFile.txt".  What I'd really love is a
>> way to say:
>
> This is quite close to the implementation I've got.  The SVN exporter
> runs in two stages:
>
> In the first stage, the script treats any non-blacklisted file as a
> marker file, but only looks for trunk branches.  It looks all through
> the history, traces back through the copyfroms, and tries to find the
> original directory associated with the file.  Usually it decides that
> the only branch without a copyfrom is /trunk.  Searching just for trunks
> with this weak heuristic makes it much easier to hand-verify the result.

I'm not sure I understand.  So if I have /trunk/projectA and
/trunk/projectB then do I have to blacklist /trunk/projectB to extract
only projectA's history?  Assuming it's always lived there will your
code detect /trunk/projectA as the "trunk?"  Would it be possible to
specify /trunk/projectA directly instead of blacklisting everything
else?

> In the second stage, the script looks through the history again, tracing
> the copies of known branches in a slightly less clever way than
> described in my previous e-mail.  There's no need for marker files this
> time round, as we just assume any `svn cp /trunk
> /directory/not/within/a/branch` is a new branch.  In my experiments this
> has been a pretty solid way of detecting branches without too much human
> input - I might be missing something (or have mis-explained something),
> but I'd be interested to hear examples of where this would go wrong.

That sounds pretty good, but it should probably also be transitive,
i.e. `svn cp /any/known/branch/root /some/new/path` is also a new
branch.  Sometimes we'll spin off hotfix branches from release
branches, for example.

I'll have to give your code a try and see how it works.

Cheers,
-n8

-- 
http://n8gray.org
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html