On 3/6/12 12:35 PM, Stephen Bash wrote:
The problem of specifying and detecting branches is a major problem in
my upcoming conversion. We've got toplevel trunk/branches/tags
directories but underneath "branches" it's a free-for-all:
/branches/codenameA/{projectA,projectB,projectC}
/branches/codenameB (actually a branch of projectA)
/branches/developers/joe/frobnicator-experiment (also a branch of
projectA)
Clearly there's no simple regex that's going to capture this, so I'm
reduced to listing every branch of projectA, which is tedious and
error-prone. However, what *would* work fabulously well for me is
"marker file" detection. Every copy of projectA has a certain file at
it's root. Let's call it "markerFile.txt". What I'd really love is a
way to say:
my %branch_markers = {'/branches/**/markerFile.txt' =>
'/refs/heads/**'}
Ooo... I like it. I hadn't hit on this idea yet, but it certainly is a very helpful heuristic. I doubt I'd have any sort of demo code for you in the near future, but it's definitely an idea to roll into the mix.
What I did for the Perl Perforce conversion is make this a multi–step
process; first, the heuristic goes through and detects branches and
merge parents. Then you do the actual export. If, however, the
heuristic gets it wrong, then you can manually override the branch
detection for a particular revision, which invalidates all of the
_automatic_ decisions made for later revisions the next time you run it.
Even with all of the information in Postgres, and much of the hard work
pushed into the Postgres engine, and Postgres tuned for OLAP, this was
the slowest part of the operation. For a 30,000–odd revision Perforce
repository.
The manual input is extremely useful for bespoke conversions; there will
always be warts in the history and no heuristic is perfect (even if you
can supply your own set of expressions, a way to override it for just
one revision is handy).
Just to revise, the steps in git-p4raw, are:
* load metadata (git-p4raw load ; git-p4raw check)
* load blobs (git-p4raw export-blobs)
* find project roots (git-p4raw find-branches)
Project root decisions can be overridden, in git-p4raw this was
through a DB insert, but all this consisted of was inserting (revision,
branch) tuples into the appropriate table so a front–end would be
trivial. As you suggest, a custom heuristic is also an option but the
most flexible solution is just being able to override the decisions made
for a particular revision.
* detect project merges (also done by git-p4raw find-branches)
Detecting merge parents used a heuristic based on the per–file
integration records and a computation based on an internal diff-tree
which produced a list of files that would have needed resolving. This
one I actually used enough to bother implementing a front–end for:
git-p4raw graft REV PARENT PARENT
Where 'PARENT' could be another project root (revision/branch location),
or it could be a git commit ID (for the inevitable occasion where you
need to manually graft on some history). This interface allows you to
do several things:
1. mark a merge which was not recorded correctly in history
2. un–mark a merge which was detected/recorded incorrectly
3. skip bad sections of history, for instance squash merging merges
which happened over several commits (SVN and Perforce, of course,
support insane piecemeal merging prohibited by git)
* the actual fast-import exporter.
git-p4raw export-commits 1..5000
There was also an important reverse operation:
git-p4raw unexport-commits 2500
Which moved all of the exported refs backwards, deleted ones which
didn't exist at revision 2500.
Once the data has been mined, the actual exporting can proceed very
fast. Eg, on my laptop I could easily be topping 300 commits per second
which makes for a nice export/examine/rewind/adjust cycle.
For more information,
git clone git://github.com/samv/git-p4raw
cd git-p4raw
perldoc git-p4raw
The "Game plan." section of the POD is particularly relevant. Remember
that SVN is very similar to Perforce in virtually all of its design
details so this tool, its database schema, and implementation are all
very relevant to the design of the new svn-fe importer.
Sam
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html