Re: GSOC Proposal draft: git-remote-svn

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Florian Achleitner wrote:

> Thanks for your inputs. I've now submitted a slightly updated version of my 
> proposal to google. Additionally it's on github [1].
>
> Summary of diffs:
> I'll concentrate on the fetching from svn, writing a remote helper without 
> branch detection (like svn-fe) first, and then creating the branch mapper.

Thanks for the update.

If I understand correctly, the remote helper from the first half would
do essentially the same thing as Dmitry's remote-svn-alpha script.
Since in shell script form it is very simple, I don't think it should
take more than a couple of days to write such a thing in C.

> Timeline
>
> GSoC timeline and summer holidays
> Summer holidays in Austria at 9th of July. So until the mid-term
> evaluations my git project will have co-exist with my regular
> university work and projects. But holidays extend until the beginning
> of October, so there’s some time left to catch up after the official
> end of GSoC.

Another possibility that some people in similar situations have
followed is to start early.  That works a little better since it means
that by the time midterm evaluations come around we can have a
reasonable idea of whether a change in strategy is needed for the
project to finished on time.

> I plan to split the project in two parts:
>
> Writing the remote helper using existing functions in vcs-svn to
> import svn history without detecting branches, like svn-fe does.
> Milestone: 9th of July, GSoC mid-term
>
> Writing a branch mapper for the remote helper that reads the config
> language (SBL) and imports branches trying to deal as good as possible
> with all the little pitfalls that will occur. Milestone: 20th of
> August, GSoC end

Could you flesh out this timeline more?  Ideally it would be nice to
have a definite plan here, even to the point of listing what patches
would need to be written, so during the summer all that would need to
happen is to execute and deal with bugs as they come.

Given the goal described here of an import with support for
automatically detecting branches, here are some rough steps I imagine
would be involved:

 . baseline: remote helper in C

 . option to import starting with a particular numbered revision.
   This would be good practice for seeing how options passed to
   "git clone -c" can be read from the config file.

 . option or URL schema to import a single project from a large
   Subversion repository that houses several projects.  This would
   already be useful in practice since importing the entire Apache
   Software Foundation repository takes a while which is a waste
   when one only wants the history of the Subversion project.

   How should the importer handle Subversion copy commands that
   refer to other projects in this case?

 . automatically detecting trunk when importing a project with the
   standard layout.  The trunk usually is not branched from elsewhere
   so this does not require copyfrom info.  Some design questions
   come up here: should the remote helper import the entire project
   tree, too?  (I think "yes", since copy commands that copy from
   other branches are very common and that would ensure the relevant
   info is available to git.)  What should the mapping of git commit
   names to Subversion revision numbers that is stored in notes say
   in this case?

 . detecting trunk and branches and exposing them as different remote
   branches.  This is a small step that just involves understanding
   how remote helpers expose branches.

 . storing path properties and copyfrom information in the commits
   produced by the vcs-svn/ library.  How should these be stored?
   For example, there could be a parallel directory structure
   in the tree:

	foo/
		bar.c
	baz/
		qux.c
	.properties/
		foo.properties
		foo/
			bar.c.properties
		baz/
			qux.c.properties

   with properites for <path> stored at .properties/<path>.properties.
   This strawman scheme doesn't work if the repository being imported
   has any paths ending with ".properties", though.  Ideas?

 . tracing history past branch creation events, using the now-saved
   copyfrom information.

 . tracing second-parent history using svn:mergeinfo properties.

In other words, in the above list the strategy is:

 1. First convert the remote helper to C so it doesn't have to be
    translated again later.

 2. Teach the remote helper to import a single project from a
    repository that houses multiple projects (i.e., path limiting).

 3. Teach the remote helper to split an imported project that uses
    the standard layout into branches (an application of the code
    from (2)).  This complicates the scheme for mapping between
    Subversion revision numbers and git commit ids.

 4. Teach the SVN dumpfile to fast-import stream converter not to
    lose the information that is needed in order to get parenthood
    information.

 5. Use the information from step (4) to get parenthood right for a
    project split into branches.

 6. Getting the second parent right (i.e., merges).  I mentioned
    this for fun but I don't expect there to be time for it.

Does that seem right, or does it need tweaks?  How long would each
step take?  Can the steps be subdivided into smaller steps?

Another question is: what is the design for this?  With the existing
remote-svn-alpha script, there are a few different components with
well defined interfaces:

	commands like "git fetch"
	  |
	  | (1)
	  |
	transport-helper --- (2) --- git fast-import
	  |                               |
	  | (2, 3)                        |
	  |                               |
	remote-svn-alpha                  | (3)
	  |             ''..              |
	  | (2)             ''(2)..       |
	  |                        ''..   |
	svnrdump --------- (3) -------- svn-fe

 (1) communicates using function calls and shared data
 (2) launches
 (3) communicates over pipe

Once remote-svn-alpha is rewritten in C, the same structure is still
present, though it might be less obvious because some of the (2)
and (3) can change into (1).

Where does the functionality you are adding fit into this picture?
Are there any new components being added, and if so what do they take
as input and output?

Hope that helps,
Jonathan

> [1] https://github.com/flyingflo/git/wiki/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]