Re: GSOC Proposal draft: git-remote-svn

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Florian,

Florian Achleitner wrote:

> Here is my draft of the proposal for the GSoC project. RFC!
> Please comment and tell me what you think and if I understood it all right!

I like the rough idea.  I also agree with Ram that the scope seems too
wide for one summer and think it would be useful to narrow the scope a
little.

Some tasks I can think of:

 - getting Dmitry's importer into contrib/ and making sure it works
   reliably.  This might require some fixes to svnrdump, svn-fe,
   and the transport-helper.  Some known problems that I suspect may
   be still unresolved:

   - files marked with both svn:special (symlink) and svn:executable

   - dealing with after-the-fact edits to the svn repository.  For
     example, revprops including svn:log can be and often are changed
     after the fact.

   - what happens when the connection to the Subversion server is
     interrupted?  The Subversion dump format does not have an
     "end of commit" marker so currently we can get confused and
     seem to succeed.

   - svn-fe does not correctly handle revs that change a text file to
     a symlink or vice versa without changing its text.

 - UI for importing only some revisions (e.g., "all revisions after
   r1000").  Dmitry has a patch for the svn-fe plumbing to handle
   this but I don't think the corresponding change for the remote
   helper has been written.

   - this would probably also require changes to svnrdump.  What
     happens when r2000 involves copying a file from a version before
     r1000?  If imports do not start at r0, normal dumps of r1000:
     are not self-contained.

 - UI for storing the mapping between Subversion revision numbers and
   git commit names in the git object db somewhere.  Currently we
   store it in a marks file.  There is a script floating around to
   convert that marks file into a set of commit notes and Dmitry also
   has a patch for svn-fe to make it write commit notes directly.
   What happens when the notes and marks file go out of sync --- which
   is authoritative?

   This also implies that repeated fetches would not have to start
   importing again at r1.

 - Storing empty directories and path-specific properties like
   svn:ignore that we don't currently handle.

 - Splitting history into branches.

   Somehow svn-fe has to communicate "svn cp" source and target
   information to the branch mapper so we can trace history to before
   the birth of the paths we are following.  That is, the full history
   of branches/1.7.x/ includes the early history of trunk/ if the
   1.7.x branch was originally created as a copy of the trunk.

   This might be able to use mechanism similar to storage of
   empty directories and path properties.

 - UI for importing only a subset of paths (e.g., "just the trunk").

   - this would probably also require changes to svnrdump.  What
     happens when r2000 involves copying a file from a branch we
     have chosen not to import?

 - Mapping authorship information from Subversion (which usually
   amounts to a remote username) to something more idiomatic in git
   (usually a human's name and email address) in a way that makes
   round trips possible.

 - Sharing an imported repository with other users of the remote
   helper.

   - this might involve changes to the remote helper machinery to
     allow new clones to use some fetch/push ref specification
     different from refs/heads/*:refs/remotes/origin/*, or it might
     involve some change to core git to automatically push notes
     corresponding to some refs in some situations.

 - Importing <rev, path> pairs that have multiple parents.  In the
   subversion model, path nodes have only one (copyfrom) parent,
   but repositories can use the svn:mergeinfo property to indicate
   that changes made in certain revs to another patch have been
   incorporated.  Under what circumstances is that enough
   justification to add a second parent on the git side?

   - Because svn:mergeinfo is a normal path property, the branch
     mapper could have enough information to take care of this with
     the help of the previously mentioned facility for storing path
     properties.

All of the above is just for reasonable fetch support.

For push support, one early problem to solve would be that pushing
a commit so that the git commit id from re-importing it is the same
requires permission to set the svn:date property.  Is our target
audience one that already has that permission?  Is that permission
something reasonable for a committer to ask for from the repository
admin in order to use the remote helper?

Because of the above:

> 1. Write a new bi-directional remote helper in C. 

The word "new" makes me worried that you'd be throwing away whatever
work already exists. :)

[...]
> { Hmm.. so it looks like thats a lot? what do you think? }

I agree --- what you've described is more than one summer's worth
of work.  Are there any aspects you're particularly interested in
focusing on?  For example,

 (1) If we focus on repositories without any branching structure at
     all and where the user has full ability to write whatever she
     pleases to the repository, I think developing a bidirectional
     remote helper is feasible during the summer.  Round-trip
     support (i.e., commit ids staying the same with a push followed
     by a fetch) is feasible with such a quick plan if we're willing
     to store some git-specific junk in the repo.

 (2) Regarding a tool that sits between svn-fe and the remote helper
     and implements the "follow parent" rule for tracing the full
     history of a single (linear) branch: I think developing that
     _and_ getting it merged could fit in the summer.

 (3) Regarding storing and sharing Subversion's path-specific
     and revision-specific properties: I think implementing a
     mechanism for that and getting it merged could fit in one
     summer.

 (4) Regarding getting git weirdness like distinct author and
     committer names, lack of rename information cooked at commit
     time, and timezones in author and committer dates handled during
     pushes to Subversion in a non-invasive way that is user-friendly
     for the pusher likely to be acceptable on the receiving side for
     normal projects: that could certainly fill a summer.

 (5) Subversion weirdness like revs that change the entire repository
     at once in a many-branch repo, non-standard file modes, and
     noticing and acting appropriately for svn:log messages that were
     changed after the fact could fill another summer.

So ideally I would like 5 students working on the remote helper
project. ;-)

Hope that helps,
Jonathan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]