GSOC Proposal draft: git-remote-svn

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi everybody!

Here is my draft of the proposal for the GSoC project. RFC!
Please comment and tell me what you think and if I understood it all right!

I spent a lot of lines with wiriting about the current situation. This is 
mostly because, as a newbee, I spent a lot of time examining what we already 
have and wrote it down finally.

The draft is inlined below. I hope it's not too long to read. I will put it on 
a github wiki later, once i figure out how this works ;)

Florian


==Remote helper for Subversion==

==Introduction==
{ for non-insiders }
Git [1] is a powerful distributed version control system (DVCS). "Distributed" 
means that everybody works on a full featured repository. To collaborate with 
other user's repositories git can fetch and pull from remote repositories 
using several transports (http://, ssh://, git://, ...). Git has a very 
powerful and useful concept of branches. They are lightweight pointers to 
commits (heads).

Subversion (svn) [2] was created as a successor of CVS, both follow a strict 
client-server design, where the repository exclusively lives on the central 
server and every client only checks out a copy of a single revision at a time. 
SVN doesn't truly have a concept of branches. SVN branches are a copy of a 
directory (so are tags).

==What we want (the general goal)==
short: 
git clone svn://<url>
git push
git fetch

A full-featured bi-directional remote helper for svn that allows git to use a 
svn repository as a remote, mostly like a remote git repo.
Remote helpers are separate programs invoked by git to communicate with 
foreign repositories. They are used by transceiving a command and data stream 
via stdin and stdout.

The remote helper interface [2] supports commands that deliver a git-fast-
import stream from the remote repo.

git-fast-import [4] is a format to serialize a git repository into a text 
format. It is used by the tools git-fast-import and git-fast-export.

The remote helper has to convert the foreign protocol and data (svn) to the 
git-fast-import format.

==What are the challenges? ==
To summarize: The way git tracks the state of the working tree and svn's way 
are different in several aspects. This makes a direct mapping impossible.
There are lots of discussions about these issues on the git mailing list [5].

Some aspects: (I'm sure this is incomplete)
- svn commit and file metadata, it's symlink and permission representation have 
to to be mapped to git.

- svn history can only be extracted from the server (we have svnrdump for 
that)

- svn commits are only possible after updating the working copy first, i.e. 
fetching and merging new revisions on the server. This is like implicitly 
rebasing your local work on the remote head before pushing to an svn 
repository.
In git there is of course no such restriction.

- and the most challenging: mapping subversion branches to git branches. 
In svn a branch is created by copying a directory with 'svn copy'. svn doesn't 
have a concept of branches by itself. 

Branches exist due to the convention of having branches/, trunk/, and tags/ 
directories in a repository, so do tags. But this is not mandatory and 
therefore there are many different layouts. It follows that in svn it is also 
possible to commit across branches. This means that a single commit can change 
files on more than one branch (accidentally or deliberately).
To convert svn branches to git we have to detect branch semantics by examining 
the svn tree's structure and it's metadata (it has a 'copyfrom' property). 
Previous efforts show that this will not be possible fully automatically 
without configuration and interaction with the user.

This brings us to:
==What we have: (existing work)==
Andrew Sayers is currently developing a language to describe svn to git branch 
mapping [6]. I plan to use the language as a configuration for the remote 
helper that specifies unclear aspects.

"esr" developed a tool to manipulate and export subversion repositories [7] 
that should be able to detect branches, but it's sources are not available 
yet.

In git's tree there is git-svn, a huge Perl script used to convert svn to git. 
It detects branches, but with problems. It also supports some kind of pushing 
commits to svn using a separate command. It's problem: it's unmaintainable, 
bugs are hard to locate and to fix.

There are several other one-way conversion tools, e.g. svn-fast-export, 
svn2git.py.

In git's source tree we have a vcs-svn/, a set of functions to convert svn 
dumps to git-fast-import streams. Those are used by svn-fe to one-way import 
svn history to git. svn-fe doesn't do branch mapping yet.

We have Ramkumar Ramachandra's svnrdump [8] which now lives in the svn source 
tree. It can create dump files [9] from remote svn servers and load dump files 
up to svn server.
It practically provides read-write access to svn using a text format.

There is a prototype remote helper from Dmitry Ivankov. A bash script 
providing one way fetching from svn via svnrdump and svn-fe.

{ did I miss something important? }

==Project outline==
Please look at the drawing on:
http://filestore.mg34.vc-graz.ac.at/flo/drawing.svg

1. Write a new bi-directional remote helper in C. 
  - It uses vcs-svn utilities to convert svn dumps to git-fast-import and 
vice-versa.
  - It calls svnrdump as a backend to communicate with svn.
  - It reads a configuration file containing branch mappings according to [6]. 
These mapping have to be pre-generated using tools developed along with the 
language. The remote helper has no way of asking the user what to do. It will 
fail if a mapping is unclear.
  - Because generating the branch mapping configuration already requires that 
you have a dump of the svn repo, the helper should probably be able to read 
from a file in place of svnrdump too.
  - Using the config the helper translates svn branches/tags to git 
branches/tags and converts other metadata as applicable. It probably has to 
store some information about the mapping in a file in .git to allow a 
reconstruction on subsequent invocations. I think this is especially important 
when pushing to branches (does it already exist in svn, and where? is it new).
  - It communicate with git via the fast-import format. The remote helper 
interface (will have)|has commands for that.

2. Extend the remote helper interface as necessary to read and write fast-
import streams to remote helpers

3. Add output capabilities to vcs-svn. Currently the code in vcs-svn can only 
convert svn to git. To push to svn we also need conversion and mapping from 
git to svn. The actual mapping code for branches should also be placed here 
{??} and called by the remote helper.

{ Hmm.. so it looks like thats a lot? what do you think? }

Timeline
{ Still to come !}

About me
{ I sent an introduction to the list already, so I'll not copy it here. But it 
will be in the application on GSOC site.}

[1] http://git-scm.com/
[2] http://subversion.tigris.org/
[3] git sources git/Documentation/git-remote-helpers.html
[4] git sources git/Documentation/git-fast-import.html
[5] http://thread.gmane.org/gmane.comp.version-control.git/192106
[6] https://github.com/andrew-sayers/SVN-Branching-Language
[7] http://esr.ibiblio.org/?p=4071
[8] http://svnbook.red-bean.com/en/1.7/svn.ref.svnrdump.html
[9] svn sources subversion/notes/dump-load-format.txt
[10] https://github.com/divanorama/git/blob/remote-svn-alpha/contrib/svn-
fe/git-remote-svn-alpha
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]