Re: [RFC] "Remote helper for Subversion" project

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi guys,

I made a few little git contributions a couple of years back, before
being swallowed up by my old job.  I've become quite interested in
svn->git translation since starting a new job, but I'm delurking earlier
than planned so please bear with me as all I have right now is an
interesting proof of concept and the possibility of some free time some
day.  If things go well, I hope to help out with the
branching-and-merging part of the problem, so I'll describe what I've
done and how it might affect SoC projects.  Apologies in advance for
hijacking the thread :)

I mentioned proof-of-concept code for the work I've done - I'd like to
get confirmation from my new employer before putting it all online, so
hopefully I can make it available in a few days.

While researching the problem, I found Stephen Bash's original
proposal[1] and snerp-vortex[2] quite inspiring, but wasn't able to find
any details on SoC-related work in the branching-and-merging department
- hopefully the following isn't just a retread of ideas developed since
then.  I've concentrated on importing from SVN so far, but have kept an
eye on update and half an eye on bi-direction in the hopes of being
useful there some day.

It seems to me the "svn export" and "git import" steps make most sense
as two unrelated projects.  Snerp-vortex and Stephen's scripts both cut
the history import problem at that point, as do svn-fe/git-fast-import
with code import.  Exporting SVN history is a messy and sometimes
project-specific job, so allowing a project to concentrate on that part
makes it possible for SVN experts to use all their skills without having
to learn git plumbing before they make their first commit (much respect
to Stephen for managing that feat BTW).


I've written a proof-of-concept history converter that can be split into
three parts: a format for describing SVN history; a large, often messy
Perl program that writes files in that format; and a small Perl program
that reads the format and translates it into git.  With hindsight, Perl
is the right language for the SVN exporter, but the git importer would
have been better written in C.


Personally, I think SVN export will always need a strong manual
component to get the best results, so I've put quite a bit of work into
designing a good SVN history format.  Like git-fast-import, it's an
ASCII format designed both for human and machine consumption:

In r1, create branch "trunk"
In r10, create branch "branches/foo" as "foo" from "trunk" r9
In r12, create tag "tags/version1" as "version1" from "trunk" r11
In r12, deactivate "tags/version1"
# blank lines and lines beginning with a '#' are ignored
In r15, merge "branches/foo" r14 into "trunk"
In r15, delete "branches/foo"

The above has been designed as an abstract representation of SVN
history, with as little git-specific content as possible.  This turns
out to make the problem a bit easier to think about, a bit easier to
implement, and potentially useful to other DVCSs some day.  I wanted to
enable svn-merge2git[3]-style extraction of merge info from logs, but
found that even a message as clear as "merge r123 from trunk" could mean
"cherry-pick revision 123", "merge everything up to revision 123",
"merge everything up to the last revision that touched trunk before
123", "merge everything up to 132", or any number of other things.  As
such, the format is designed to allow copious comments and ease of
bulk-editing with regexps and a powerful editor.

The format feels relatively complete to me, and in any case not a good
candidate for an SoC project because it's all about experience and
nothing to do with raw talent.


Once the format is defined, git import is fairly straightforward.
Proof-of-concept code to follow, but it's really just a wrapper around
git-commit-tree, git-mktag etc.  I wrote this in Perl thinking it would
relate somehow to git-svn, but eventually realised it didn't and that a
few hundred calls to (plumbing) processes per second isn't so good for
performance.  The only interesting part of the problem is how to tackle
SVN tags.  I went for an ambitious approach, making normal tags where
possible and downgrading them to lightweight tags when necessary.  This
does involve managing something that is effectively a branch in
refs/tags/, but what else is an SVN tag but a branch in the wrong namespace?

I'm afraid I don't know enough about SoC to say whether rewriting git
import in C would make a good project - on the one hand, it's a smallish
bit of work that would make a student learn a lot of good git code, on
the other hand it would mostly just involve fixing up a bit of ugly Perl
with little chance to show any creativity.


SVN export is much more complicated, and has taken most of my time.
Although there's no way to tell automatically which directories are
branches, I realised you can detect trunks automatically about 90% of
the time by looking for where files/directories are first created, and
can detect non-trunk branches at least 99% of the time once you know the
trunks.  Trunk detection is normally just a convenience, but can be a
lifesaver when importing from a sufficiently messy repository.  There's
a lot you can do with merge detection, but between svn:mergeinfo,
svnmerge.py[4], svk[5] and log messages I realise I've only scratched
the surface.  In many ways, this is a classic Perl problem - take a
bunch of messy textual input, string it all together and make some neat
textual output.  The code I've got right now seems fine for anything up
to about 20,000 commits, but eats way too much memory for huge repos.
Depending on how much time I get, I might have tackled that problem by
the time I put this code online.

Assuming I have enough time to work on SVN export, I would be loathed to
put it on anyone else in the near future.  I could see endless
optimisations being spun off once the code is mature, but right now it's
an idiosyncratic collection of untested mostly-working bits that need
serious attention before I'd be happy warping a young mind on it.


I hope this information dump explains where I am and how I can help.  As
I say, I can't yet promise how much (if any) time I'll get to work on
this in future, but I hope to help with the history translation process
if circumstances allow.  More importantly to this thread, I hope this
gives some ideas about SoC projects and places (not) to put attention.

	- Andrew Sayers

[1] http://comments.gmane.org/gmane.comp.version-control.git/158940
[2] https://github.com/rcaputo/snerp-vortex
[3] http://repo.or.cz/w/svn-merge2git.git
[4] http://www.orcaware.com/svn/wiki/Svnmerge.py
[5] http://search.cpan.org/dist/SVK/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]