Re: GSOC Proposal draft: git-remote-svn

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tuesday 10 April 2012 12:17:07 Jonathan Nieder wrote:
> Hi,
> 
> Florian Achleitner wrote:
> > Thanks for your inputs. I've now submitted a slightly updated version of
> > my proposal to google. Additionally it's on github [1].
> > 
> > Summary of diffs:
> > I'll concentrate on the fetching from svn, writing a remote helper
> > without branch detection (like svn-fe) first, and then creating the
> > branch mapper.
> Thanks for the update.
> 
> If I understand correctly, the remote helper from the first half would
> do essentially the same thing as Dmitry's remote-svn-alpha script.
> Since in shell script form it is very simple, I don't think it should
> take more than a couple of days to write such a thing in C.

If the remote-svn-alpha script is really all that needs to be done, you're 
right. It just pipes through svn-fe. I thought svn-fe could only import an svn 
repo initially, and there would be some difference between importing the whole 
history and fetching new revisions later, (?).

> Via
> > Timeline
> > 
> > GSoC timeline and summer holidays
> > Summer holidays in Austria at 9th of July. So until the mid-term
> > evaluations my git project will have co-exist with my regular
> > university work and projects. But holidays extend until the beginning
> > of October, so there’s some time left to catch up after the official
> > end of GSoC.
> 
> Another possibility that some people in similar situations have
> followed is to start early.  That works a little better since it means
> that by the time midterm evaluations come around we can have a
> reasonable idea of whether a change in strategy is needed for the
> project to finished on time.
> 
> > I plan to split the project in two parts:
> > 
> > Writing the remote helper using existing functions in vcs-svn to
> > import svn history without detecting branches, like svn-fe does.
> > Milestone: 9th of July, GSoC mid-term
> > 
> > Writing a branch mapper for the remote helper that reads the config
> > language (SBL) and imports branches trying to deal as good as possible
> > with all the little pitfalls that will occur. Milestone: 20th of
> > August, GSoC end
> 
> Could you flesh out this timeline more?  Ideally it would be nice to
> have a definite plan here, even to the point of listing what patches
> would need to be written, so during the summer all that would need to
> happen is to execute and deal with bugs as they come.

Listing patches and planing all details in the submitted proposal would 
require me to know what I do and how I will do it all before last Friday! As 
I'm not yet an expert on this topic, I don't know how I could have known all 
details a-priori.
Of course the project's documentation will evolve outside the GSoC project 
proposal, which cannot be changed anymore.

> 
> Given the goal described here of an import with support for
> automatically detecting branches, here are some rough steps I imagine
> would be involved:
> 
>  . baseline: remote helper in C
> 
>  . option to import starting with a particular numbered revision.
>    This would be good practice for seeing how options passed to
>    "git clone -c" can be read from the config file.
> 
>  . option or URL schema to import a single project from a large
>    Subversion repository that houses several projects.  This would
>    already be useful in practice since importing the entire Apache
>    Software Foundation repository takes a while which is a waste
>    when one only wants the history of the Subversion project.
> 
>    How should the importer handle Subversion copy commands that
>    refer to other projects in this case?
> 
>  . automatically detecting trunk when importing a project with the
>    standard layout.  The trunk usually is not branched from elsewhere
>    so this does not require copyfrom info.  Some design questions
>    come up here: should the remote helper import the entire project
>    tree, too?  (I think "yes", since copy commands that copy from
>    other branches are very common and that would ensure the relevant
>    info is available to git.)  What should the mapping of git commit
>    names to Subversion revision numbers that is stored in notes say
>    in this case?
> 
>  . detecting trunk and branches and exposing them as different remote
>    branches.  This is a small step that just involves understanding
>    how remote helpers expose branches.
> 
>  . storing path properties and copyfrom information in the commits
>    produced by the vcs-svn/ library.  How should these be stored?
>    For example, there could be a parallel directory structure
>    in the tree:
> 
> 	foo/
> 		bar.c
> 	baz/
> 		qux.c
> 	.properties/
> 		foo.properties
> 		foo/
> 			bar.c.properties
> 		baz/
> 			qux.c.properties
> 
>    with properites for <path> stored at .properties/<path>.properties.
>    This strawman scheme doesn't work if the repository being imported
>    has any paths ending with ".properties", though.  Ideas?
> 
>  . tracing history past branch creation events, using the now-saved
>    copyfrom information.
> 
>  . tracing second-parent history using svn:mergeinfo properties.
> 
> In other words, in the above list the strategy is:
> 
>  1. First convert the remote helper to C so it doesn't have to be
>     translated again later.
> 
>  2. Teach the remote helper to import a single project from a
>     repository that houses multiple projects (i.e., path limiting).
> 
>  3. Teach the remote helper to split an imported project that uses
>     the standard layout into branches (an application of the code
>     from (2)).  This complicates the scheme for mapping between
>     Subversion revision numbers and git commit ids.
> 
>  4. Teach the SVN dumpfile to fast-import stream converter not to
>     lose the information that is needed in order to get parenthood
>     information.
> 
>  5. Use the information from step (4) to get parenthood right for a
>     project split into branches.
> 
>  6. Getting the second parent right (i.e., merges).  I mentioned
>     this for fun but I don't expect there to be time for it.
> 
> Does that seem right, or does it need tweaks?  How long would each
> step take?  Can the steps be subdivided into smaller steps?
> 
> Another question is: what is the design for this?  With the existing
> remote-svn-alpha script, there are a few different components with
> well defined interfaces:
> 
> 	commands like "git fetch"
> 
> 	  | (1)
> 
> 	transport-helper --- (2) --- git fast-import
> 
> 	  | (2, 3)                        |
> 
> 	remote-svn-alpha                  | (3)
> 
> 	  |             ''..              |
> 	  | 
> 	  | (2)             ''(2)..       |
> 	  | 
> 	  |                        ''..   |
> 
> 	svnrdump --------- (3) -------- svn-fe
> 
>  (1) communicates using function calls and shared data
>  (2) launches
>  (3) communicates over pipe
> 
> Once remote-svn-alpha is rewritten in C, the same structure is still
> present, though it might be less obvious because some of the (2)
> and (3) can change into (1).
> 
> Where does the functionality you are adding fit into this picture?
> Are there any new components being added, and if so what do they take
> as input and output?

I planned to implement a remote-helper using the existing interface 
specification to communicate over pipes with git's transport-helper. 
Instead of invoking svn-fe as a subprocess, I want to call vcs-svn/ functions 
directly from the remote-helper and place new functions in this directory (?).
To communicate with svn, the remote-helper launches svnrdump as a subprocess.
Additionally the remote-helper will read a configuration file containing 
additional information about branch-mapping, this should be closely related to 
Andrew's SBL.

> 
> Hope that helps,
> Jonathan
> 
> > [1] https://github.com/flyingflo/git/wiki/

Florian
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]