Re: [spf:guess] Re: Approaches to SVN to Git conversion (was: Re: [RFC] "Remote helper for Subversion" project)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/6/12 12:35 PM, Stephen Bash wrote:
The problem of specifying and detecting branches is a major problem in
my upcoming conversion.  We've got toplevel trunk/branches/tags
directories but underneath "branches" it's a free-for-all:

/branches/codenameA/{projectA,projectB,projectC}
/branches/codenameB   (actually a branch of projectA)
/branches/developers/joe/frobnicator-experiment (also a branch of
projectA)

Clearly there's no simple regex that's going to capture this, so I'm
reduced to listing every branch of projectA, which is tedious and
error-prone.  However, what *would* work fabulously well for me is
"marker file" detection.  Every copy of projectA has a certain file at
it's root.  Let's call it "markerFile.txt".  What I'd really love is a
way to say:

my %branch_markers = {'/branches/**/markerFile.txt' =>
                       '/refs/heads/**'}

Ooo...  I like it.  I hadn't hit on this idea yet, but it certainly is a very helpful heuristic.  I doubt I'd have any sort of demo code for you in the near future, but it's definitely an idea to roll into the mix.

What I did for the Perl Perforce conversion is make this a multi–step process; first, the heuristic goes through and detects branches and merge parents. Then you do the actual export. If, however, the heuristic gets it wrong, then you can manually override the branch detection for a particular revision, which invalidates all of the _automatic_ decisions made for later revisions the next time you run it.

Even with all of the information in Postgres, and much of the hard work pushed into the Postgres engine, and Postgres tuned for OLAP, this was the slowest part of the operation. For a 30,000–odd revision Perforce repository.

The manual input is extremely useful for bespoke conversions; there will always be warts in the history and no heuristic is perfect (even if you can supply your own set of expressions, a way to override it for just one revision is handy).

Just to revise, the steps in git-p4raw, are:

* load metadata (git-p4raw load ; git-p4raw check)
* load blobs (git-p4raw export-blobs)
* find project roots (git-p4raw find-branches)

Project root decisions can be overridden, in git-p4raw this was through a DB insert, but all this consisted of was inserting (revision, branch) tuples into the appropriate table so a front–end would be trivial. As you suggest, a custom heuristic is also an option but the most flexible solution is just being able to override the decisions made for a particular revision.

* detect project merges (also done by git-p4raw find-branches)

Detecting merge parents used a heuristic based on the per–file integration records and a computation based on an internal diff-tree which produced a list of files that would have needed resolving. This one I actually used enough to bother implementing a front–end for:

  git-p4raw graft REV PARENT PARENT

Where 'PARENT' could be another project root (revision/branch location), or it could be a git commit ID (for the inevitable occasion where you need to manually graft on some history). This interface allows you to do several things:

  1. mark a merge which was not recorded correctly in history
  2. un–mark a merge which was detected/recorded incorrectly
3. skip bad sections of history, for instance squash merging merges which happened over several commits (SVN and Perforce, of course, support insane piecemeal merging prohibited by git)

* the actual fast-import exporter.

  git-p4raw export-commits 1..5000

There was also an important reverse operation:

  git-p4raw unexport-commits 2500

Which moved all of the exported refs backwards, deleted ones which didn't exist at revision 2500.

Once the data has been mined, the actual exporting can proceed very fast. Eg, on my laptop I could easily be topping 300 commits per second which makes for a nice export/examine/rewind/adjust cycle.

For more information,

  git clone git://github.com/samv/git-p4raw
  cd git-p4raw
  perldoc git-p4raw

The "Game plan." section of the POD is particularly relevant. Remember that SVN is very similar to Perforce in virtually all of its design details so this tool, its database schema, and implementation are all very relevant to the design of the new svn-fe importer.

Sam
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]