Hi everyone, Soon I'm going to be undertaking a migration of a subproject from a very messy multiproject SVN repo to git, so this is a topic that's quite near to my heart at the moment. More inline... On Mon, Mar 5, 2012 at 7:27 AM, Stephen Bash <bash@xxxxxxxxxxx> wrote: > > ----- Original Message ----- >> From: "Andrew Sayers" <andrew-git@xxxxxxxxxxxxxxx> >> Sent: Sunday, March 4, 2012 8:36:41 AM >> Subject: Re: [RFC] "Remote helper for Subversion" project >> [snip] >> Personally, I think SVN export will always need a strong manual >> component to get the best results, so I've put quite a bit of work >> into designing a good SVN history format. Like git-fast-import, it's >> an ASCII format designed both for human and machine consumption... > > First, I'm very impressed that you managed to get a language like this up and working. It could prove very useful going forward. On the flip side, from my experiments over the last year I've actually been leaning toward a solution that is more implicit than explicit. Taking git-svn as a model, I've been trying to define a mapping system (in Perl): > > my %branch_spec = { '/trunk/projname' => 'master', > '/branches/*/projname' => '/refs/heads/*' }; > my %tag_spec = { '/tags/*/projname' => '/refs/tags/*' }; The problem of specifying and detecting branches is a major problem in my upcoming conversion. We've got toplevel trunk/branches/tags directories but underneath "branches" it's a free-for-all: /branches/codenameA/{projectA,projectB,projectC} /branches/codenameB (actually a branch of projectA) /branches/developers/joe/frobnicator-experiment (also a branch of projectA) Clearly there's no simple regex that's going to capture this, so I'm reduced to listing every branch of projectA, which is tedious and error-prone. However, what *would* work fabulously well for me is "marker file" detection. Every copy of projectA has a certain file at it's root. Let's call it "markerFile.txt". What I'd really love is a way to say: my %branch_markers = {'/branches/**/markerFile.txt' => '/refs/heads/**'} I'm using ** to signify that this may match multiple path components (sorry, I don't know perl glob syntax). A branch point is any revision that creates a new file that matches the marker pattern. Ideally one could use logical connectives like AND and OR to specify a set of patterns that could account for marker files changing over the history of the project, but for my purposes that wouldn't be necessary -- we've got a well-defined marker that's always present. For bonus points I'd like to be able to speed things up by excluding known-bad markers. Say projectB has a file "badMarker.txt" at its root and I don't want to import projectB into my new repo. Maybe I could specify: my %branch_spec = { '/branches/**/markerFile.txt' => '/refs/heads/**', '/branches/**/badMarker.txt' => '!'} I'm assuming that it would be helpful for the script to have this information (e.g. it could stop recursive searches when badMarker.txt is found), but maybe that's not the case. I'd welcome any comments or (especially!) code to try out. ;^) Cheers, -Nathan -- http://n8gray.org -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html