Re: Approaches to SVN to Git conversion (was: Re: [RFC] "Remote helper for Subversion" project)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi everyone,

Soon I'm going to be undertaking a migration of a subproject from a
very messy multiproject SVN repo to git, so this is a topic that's
quite near to my heart at the moment.  More inline...

On Mon, Mar 5, 2012 at 7:27 AM, Stephen Bash <bash@xxxxxxxxxxx> wrote:
>
> ----- Original Message -----
>> From: "Andrew Sayers" <andrew-git@xxxxxxxxxxxxxxx>
>> Sent: Sunday, March 4, 2012 8:36:41 AM
>> Subject: Re: [RFC] "Remote helper for Subversion" project
>>

[snip]

>> Personally, I think SVN export will always need a strong manual
>> component to get the best results, so I've put quite a bit of work
>> into designing a good SVN history format.  Like git-fast-import, it's
>> an ASCII format designed both for human and machine consumption...
>
> First, I'm very impressed that you managed to get a language like this up and working.  It could prove very useful going forward.  On the flip side, from my experiments over the last year I've actually been leaning toward a solution that is more implicit than explicit.  Taking git-svn as a model, I've been trying to define a mapping system (in Perl):
>
>  my %branch_spec = { '/trunk/projname' => 'master',
>                      '/branches/*/projname' => '/refs/heads/*' };
>  my %tag_spec = { '/tags/*/projname' => '/refs/tags/*' };

The problem of specifying and detecting branches is a major problem in
my upcoming conversion.  We've got toplevel trunk/branches/tags
directories but underneath "branches" it's a free-for-all:

/branches/codenameA/{projectA,projectB,projectC}
/branches/codenameB   (actually a branch of projectA)
/branches/developers/joe/frobnicator-experiment (also a branch of projectA)

Clearly there's no simple regex that's going to capture this, so I'm
reduced to listing every branch of projectA, which is tedious and
error-prone.  However, what *would* work fabulously well for me is
"marker file" detection.  Every copy of projectA has a certain file at
it's root.  Let's call it "markerFile.txt".  What I'd really love is a
way to say:

my %branch_markers = {'/branches/**/markerFile.txt' => '/refs/heads/**'}

I'm using ** to signify that this may match multiple path components
(sorry, I don't know perl glob syntax).  A branch point is any
revision that creates a new file that matches the marker pattern.

Ideally one could use logical connectives like AND and OR to specify a
set of patterns that could account for marker files changing over the
history of the project, but for my purposes that wouldn't be necessary
-- we've got a well-defined marker that's always present.

For bonus points I'd like to be able to speed things up by excluding
known-bad markers.  Say projectB has a file "badMarker.txt" at its
root and I don't want to import projectB into my new repo.  Maybe I
could specify:

my %branch_spec = {
        '/branches/**/markerFile.txt' => '/refs/heads/**',
        '/branches/**/badMarker.txt' => '!'}

I'm assuming that it would be helpful for the script to have this
information (e.g. it could stop recursive searches when badMarker.txt
is found), but maybe that's not the case.

I'd welcome any comments or (especially!) code to try out.  ;^)

Cheers,
-Nathan

-- 
http://n8gray.org
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]