[RFC 0/2] svn-fetch|push - an alternate approach

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is sort of an RFC/ANN. There has been a whole bunch of traffic of
late on git-svn, vcs-svn, the remote helper, etc. I would like to
present an alternate solution that I've been playing with for a couple
of weeks.

I work in a team that uses a mixture of git-svn and svn proper. Whilst
it works remarkably well and I largely don't have to deal with svn on a
day to day basis, git-svn is still quite limiting compared to git
proper. Namely any merges created by myself or another either can't be
pushed into svn or information is lost when done so. Switching wholesale
to git is problematic due to the disruption this causes. There are also
numerous non-core users who want a really simple update and occasionally
checkin a single file/folder UI. Tortoise SVN fits their needs perfectly
and even with a proper git-svn bridge in place I have no plan on
dropping svn support.

Possible Solutions
------------------

In general there are a couple of ways of attacking this:

1. Have git-svn push extraneous git context into subversion and pull it
   back down on the other side.
2. Create a dummy svn server ala git-cvsserver
3. Use both a git server and svn server and have the git server push to
   the svn server.
4. Same as #3 but have the svn server push to the git server.

#1 is frought with issues due to how exacting git is wrt to recreating
commits. The recreated commits would have to have the same hash to be
able to use git's distributed nature.

#2 is possible but a lot of work. SVN clients expect a substantial
amount of metadata. This introduces considerable complexity of where to
store the metadata and what metadata to generate on the fly. Not all can
be generated on the fly as SVN presents a general purpose key value
store to clients, which many use and expect to work.

#3 and #4 have divergence issues in that you have two servers through
which commits can be pushed. If conflicting commits are pushed into both
servers at the same time, there has to be some method for conflict
resolution. Requiring that an admin go in and fix things manually is
unpalatable. The easiest way around this is to delegate one of the two
servers as the master and require that commits pushed to the other
update the master before suceeding. In this mode #3 has the SVN server
as master, and #4 has the git server.

There is an existing commercial system that came up a couple of weeks
ago that transfers commits asynchronously. I am unsure of how this
avoids conflicts if at all.

I've been working on a solution using approach #3. The code follows
(hopefully) and is currently running an internal beta with 3 or 4 users
using it and works very well.

In my setup, the SVN server is the master node. The git server has a
pre-receive hook which pushes commits onto the SVN server. The hook
checks for any SVN errors (SVN hooks, file conflicts, etc), and that no
SVN commits are intermingled with the pushed git commits (either before
or interspersed). If this fails at all, the pre-receive hook fails,
which fails the git push. The git user then pulls or rebases and tries
to push again.

Kerberos Auth (Off-Topic)
-------------------------

I am also using a kerberos auth http frontend
(http://github.com/jmckaskill/krb-httpd). This checks authentication
against active directory and uses the remote user for the --user
argument of git svn-push. Most clients are windows users. In this case I
have built a replacement version of libcurl-4.dll which enables kerberos
negotiate (as an aside the msysgit build should enable this by default).
I've then added the following to users' global git config:

credential.http://domain.name.username=dummy
credential.http://domain.name.helper=!echo password=dummy

With this in place, git uses the user's domain login and never asks for
a password. There is also some equivalent tweaks to get domain logins
working in browsers for cgit.

How it Works
------------

The attached patch adds two commands to git: svn-fetch and svn-push. The
names are temporary. My plan is to refactor these into git-remote-svn so
that git push/pull/fetch work as expected. svn-fetch fetches svn commits
for all branches and creates git commits, branches, tags, etc as it goes
along. svn-push takes the remote ref name, from and to sha1 (or a list
of these on stdin for use as a pre-receive hook) and pushes the changes
to svn, creating/deleting branches/tags as neccessary and failing if
there are any intermingled svn commits in the pushed to folders.

Metadata is tracked by creating an extra git commit for each svn commit
in each branch/tag that is stored in refs/svn/heads|tags as well as
refs/svn/latest for the latest fetched commit. These commits look
like the following:

tree <svn tree>
parent <underlying git commit>
parent <previous svn commit for this svn folder>
author <svn author + time>
committer <svn author + time>
revision <svn revision>
path <svn path relative to svn.url>

Using commits to track the SVN metadata has proven really really handy.
The code gets to use all the built-in locking primitives, packing, etc.
git push --mirror nicely mirrors all of the metadata as well as the
git commits. As an anecdote I had a bug the other day where my
check_for_svn_commits wasn't working and so had missed a svn commit.
After fixing the bug, I needed to rerun the fetch to grab the missing
commit. This was a simple matter of: ran git log --pretty=raw
refs/svn/heads/trunk to get the svn/git commit sha1s, updated the refs,
and reran git svn-fetch.

As a twist the code does not use the svn library, but rather talks the
svn protocol directly. I actually found it much easier to go this route
then trying to bend everything to how the svn library understands
things. It also has the advantage of not depending on libsvn. A number
of distributions currently distribute the svn specific parts of git
seperately to avoid this dependency.

Trees
-----

For each commit I track the svn and git trees seperately. The svn tree
tracks exactly what SVN returns byte for byte. This is required so that
fetched SVN diffs can be processed correctly. I can then have slight
tweaks between the two trees. Currently I have three differences between
the two trees.

If svn.eol is set then eol conversion is done between the two trees
(controlled by info/attributes). This way all of my imported git trees
have unix line endings, whilst the svn trees have a mixture. In my case
they should be windows line endings, but some third party libraries we
imported as unix line endings. Files updated in git are renormalized to
the eol given as svn.eol.

The second tweak is that changes to .git* pushed from the git side are
not pushed through to svn. This way git users can manage .gitattributes
and .gitignore without pushing these through to svn.

The third change is that svn-fetch creates a .gitempty file in any
empty directory on the svn side. This forces git to create the directory
and gives the git user a clean way of removing the directory.

Branches
--------

Branch names in SVN are a bit more leniant then ref names in git. For
example SVN allows spaces whilst git does not. Thus svn-fetch converts
all disallowed characters to '_'. The svn commit stores the original
branch name so it can be pushed back to. svn-fetch does not currently
handle conflicts where two different SVN branches collide with a single
git name.

When updating a branch svn-push tries to find a path from the previous
commit to the target. When another branch has been merged into the svn
branch, then the pushed commits look very similar to the svn equivalent.
Namely just the merge commit with a large diff. In the case that pushed
commits branch and merge back together, then svn-push simply tries to
find any path that gets it from the previous commit to the target. For
new branches it creates an svn copy from the newest svn commit that is a
ancestor of the target. In the case a forced push it will do an svn
replace with the newest ancestor in svn. The code for this is in
find_copy_source and if (!has_parent) check in do_push.

Tags
----

Tags are pushed to SVN as SVN tags (ie folder copies). Both annonated
and simple tags can be used. For annontated tags, the messages is used
for the commit message. For simple tags a hard-coded commit message of
Create <tag folder relative to svn.url> is used. svn-fetch creates
annotated tags for tags created/updated in SVN. If a tag is updated in
SVN, svn-fetch will create a git commit for the change, a new annotated
tag and overwrite the existing tag. Standard SVN practise is not to
commit to tag folders, but it does occasionally happen. This is thus
treated in the same way as the need in git to occasionally overwrite a
tag.

SVN Auth
--------

The current authentication is temporary. Currently I'm using a git-svn
style authors file hard-coded to <git-dir>/svn-authors. svn-push then
requires the user be of the form user:pass. As you push commits it will
then switch to that user by killing the connection and reopening. There
are also a couple of operations which require an svn user but don't have
a git commit to look the email up from. For these both svn-fetch and
svn-push require a --user argument. These operations are: all of
svn-fetch and get-latest-rev, log, and deleting branch/tags for
svn-push.

Pipelining
----------

svn-fetch has a -c option which lets you increase the number of
connections used to download commits. The svn protocol is annoying in
that it doesn't let you pipeline requests. svnserve will let you send
some content out of order, but once you overflow its rx buffer it will
kill the connection. So instead svn-fetch opens n+1 connections, sends
the requests ahead of time and then cycle back round to process the
reply. When initially importing one of my work projects setting this to
15 increased the fetch speed by a factor of 10.

Tests
-----

I've also added some tests for svn-fetch, svn-push. These currently
require svn 1.7 or newer. svn 1.6 doesn't understand branch
replacement and I haven't gotten around to disabling those tests.

Config
------

svn-fetch and svn-push use a number of config items:

- svn.trunk - path under svn.url of the trunk branch

- svn.branches - path under svn.url of the branches folder, branches are
    then folders inside this folder

- svn.tags - path under svn.url of the tags folders, tags are then
    folders inside this folder

- svn.user - default svn user for fetch, push without a git commit

- svn.remote - remote name to use for fetch tags/branches

- svn.trunkref - name to use for the git trunk ref (defaults to master)

- svn.eol - eol to convert to for files pushed to svn

Most of these are temporary.

TODO
----

There is a whole bunch more work to be done. My big ticket items are:

- svn over HTTP support. I've had an initial look into this and looks
    fairly straight forward. svn over HTTP is largely the same svn
commands converted to XML. Is there any recomendation on what XML
library to use or should I write my own limited version?

- fixing up auth to use credentials

- refactor svn-fetch svn-push into git-remote-svn

- adding a cfg item for the authors file and using svn-user@repo UUID if
    none is provided like git-svn

- documentation - for the moment the documentation is this email

- svn:externals - none of the repos have to deal with have this and I'm
    not sure yet how to deal with it

- style cleanup

Code
----

The code is also available at https://github.com/jmckaskill/git, which I
will keep uptodate as I flush out the todo list.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]