Re: Approaches to SVN to Git conversion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've now added a bit of documentation and uploaded my code to github:
https://github.com/andrew-sayers/Proof-of-concept-History-Converter

I haven't attached it here because the code isn't at a stage where it
would be useful to review line-by-line.  Comments are welcome if you
really want to though :)

svn-branch-export.pl makes heavy use of SVN::Dump.  You may want to get
the latest version from github if speed is important to you:
https://github.com/book/SVN-Dump/ - many thanks to Philippe Bruhat for
accepting my performance patch so quickly.

Here are some particular gripes I have with the code I've uploaded:

git-branch-import.pl gets the revision number by parsing out the
"git-svn-id" in commit messages - as I mentioned earlier, I started off
thinking this script would be closely related to git-svn somehow.  In
hindsight it would be better to read revision numbers from the marks
file exported by git-fast-import.

Branch History Format has some git-specific stuff in the setup section.
 I didn't think about this in too much detail while writing it, but
DVCS-neutrality would be better served by turning these into
command-line options.

As mentioned before, branch detection in svn-branch-export.pl is rather
muddled, as my understanding of the problem evolved significantly while
writing it.

svn-branch-export.pl half-heartedly uses a configure/make/make install
analogy to describe its behaviour - I'm increasingly sure this is
gimmicky and awful, rather than a neat explanatory trick.

svn-branch-export.pl exposes a lot of config values (e.g. "log_style")
that just bulk up the implementation and create space for bugs to creep
in without adding much actual value.  They should be removed.

On 06/03/12 19:29, Nathan Gray wrote:
<snip>
> 
> The problem of specifying and detecting branches is a major problem in
> my upcoming conversion.  We've got toplevel trunk/branches/tags
> directories but underneath "branches" it's a free-for-all:
> 
> /branches/codenameA/{projectA,projectB,projectC}
> /branches/codenameB   (actually a branch of projectA)
> /branches/developers/joe/frobnicator-experiment (also a branch of projectA)
> 
> Clearly there's no simple regex that's going to capture this, so I'm
> reduced to listing every branch of projectA, which is tedious and
> error-prone.  However, what *would* work fabulously well for me is
> "marker file" detection.  Every copy of projectA has a certain file at
> it's root.  Let's call it "markerFile.txt".  What I'd really love is a
> way to say:

This is quite close to the implementation I've got.  The SVN exporter
runs in two stages:

In the first stage, the script treats any non-blacklisted file as a
marker file, but only looks for trunk branches.  It looks all through
the history, traces back through the copyfroms, and tries to find the
original directory associated with the file.  Usually it decides that
the only branch without a copyfrom is /trunk.  Searching just for trunks
with this weak heuristic makes it much easier to hand-verify the result.

In the second stage, the script looks through the history again, tracing
the copies of known branches in a slightly less clever way than
described in my previous e-mail.  There's no need for marker files this
time round, as we just assume any `svn cp /trunk
/directory/not/within/a/branch` is a new branch.  In my experiments this
has been a pretty solid way of detecting branches without too much human
input - I might be missing something (or have mis-explained something),
but I'd be interested to hear examples of where this would go wrong.
Having said that, here's a dodgy example I'd like to pre-emptively defend:

	svn add tronk
	svn ci -m "Created trunk" # r1
	svn cp tronk trunk
	svn ci -m "D'oh" # r2
	svn rm tronk
	svn add trunk/markerFile.txt
	svn ci -m "Double d'oh!" # r3

You could argue that the correct branch history description for the
above would be:

	In r3, create branch "trunk"

In other words, ignore everything that happened before the marker file
was created.  However, I would argue the following representation is
more correct:

	In r1, create branch "tronk"
	In r2, create branch "trunk" from "tronk" r1
	In r3, delete branch "tronk"

The branch history format supports the "delete branch" command (remove
the branch entirely) as well as the more common "deactivate branch"
(keep the branch but don't accept any new commits) specifically to deal
with this sort of weirdness.  Creating a branch then deleting it keeps
the r1 revision log intact as part of the "trunk" branch, without
leaving any useless branches lying around.

	- Andrew
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]