Re: GSOC Proposal draft: git-remote-svn

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Apr 12, 2012 at 1:20 AM, Florian Achleitner
<florian.achleitner2.6.31@xxxxxxxxx> wrote:
> On Tuesday 10 April 2012 12:17:07 Jonathan Nieder wrote:
>> Hi,
>>
>> Florian Achleitner wrote:
>> > Thanks for your inputs. I've now submitted a slightly updated version of
>> > my proposal to google. Additionally it's on github [1].
>> >
>> > Summary of diffs:
>> > I'll concentrate on the fetching from svn, writing a remote helper
>> > without branch detection (like svn-fe) first, and then creating the
>> > branch mapper.
>> Thanks for the update.
>>
>> If I understand correctly, the remote helper from the first half would
>> do essentially the same thing as Dmitry's remote-svn-alpha script.
>> Since in shell script form it is very simple, I don't think it should
>> take more than a couple of days to write such a thing in C.
>
> If the remote-svn-alpha script is really all that needs to be done, you're
> right. It just pipes through svn-fe. I thought svn-fe could only import an svn
> repo initially, and there would be some difference between importing the whole
> history and fetching new revisions later, (?).
I've already forgotten the exact details, but svnrdump --incremental
from r0 to rX and then from rX+1 to Y is the same (modulo small dump
header) as from r0 to rY. And svn-fe is able to continue like this too
(maybe some bits of this are not merged, sadly I've forgotten this
too).

A side note is that svnrdump can't do the same trick for rZ, Z>1
(that's shallow clone) starting point as --incremental may produce
delta references to rX, X<Z.
So svnrdump rZ..rY is ok, but it's impossible to continue this with
svnrdump --incremental rY+1..rX. Though it probably is not too hard to
fix from inside svnrdump (disable deltas agains given"old"-threshold
revs) or if the helper becomes very smart about partial history import
it may be done from outside svnrdump, obviously via calling a new
svnrdump request for the needed data and somehow glueing it together.

>
>> Via
>> > Timeline
>> >
>> > GSoC timeline and summer holidays
>> > Summer holidays in Austria at 9th of July. So until the mid-term
>> > evaluations my git project will have co-exist with my regular
>> > university work and projects. But holidays extend until the beginning
>> > of October, so there’s some time left to catch up after the official
>> > end of GSoC.
>>
>> Another possibility that some people in similar situations have
>> followed is to start early.  That works a little better since it means
>> that by the time midterm evaluations come around we can have a
>> reasonable idea of whether a change in strategy is needed for the
>> project to finished on time.
>>
>> > I plan to split the project in two parts:
>> >
>> > Writing the remote helper using existing functions in vcs-svn to
>> > import svn history without detecting branches, like svn-fe does.
>> > Milestone: 9th of July, GSoC mid-term
>> >
>> > Writing a branch mapper for the remote helper that reads the config
>> > language (SBL) and imports branches trying to deal as good as possible
>> > with all the little pitfalls that will occur. Milestone: 20th of
>> > August, GSoC end
>>
>> Could you flesh out this timeline more?  Ideally it would be nice to
>> have a definite plan here, even to the point of listing what patches
>> would need to be written, so during the summer all that would need to
>> happen is to execute and deal with bugs as they come.
>
> Listing patches and planing all details in the submitted proposal would
> require me to know what I do and how I will do it all before last Friday! As
> I'm not yet an expert on this topic, I don't know how I could have known all
> details a-priori.
> Of course the project's documentation will evolve outside the GSoC project
> proposal, which cannot be changed anymore.
>
>>
>> Given the goal described here of an import with support for
>> automatically detecting branches, here are some rough steps I imagine
>> would be involved:
>>
>>  . baseline: remote helper in C
>>
>>  . option to import starting with a particular numbered revision.
>>    This would be good practice for seeing how options passed to
>>    "git clone -c" can be read from the config file.
>>
>>  . option or URL schema to import a single project from a large
>>    Subversion repository that houses several projects.  This would
>>    already be useful in practice since importing the entire Apache
>>    Software Foundation repository takes a while which is a waste
>>    when one only wants the history of the Subversion project.
>>
>>    How should the importer handle Subversion copy commands that
>>    refer to other projects in this case?
>>
>>  . automatically detecting trunk when importing a project with the
>>    standard layout.  The trunk usually is not branched from elsewhere
>>    so this does not require copyfrom info.  Some design questions
>>    come up here: should the remote helper import the entire project
>>    tree, too?  (I think "yes", since copy commands that copy from
>>    other branches are very common and that would ensure the relevant
>>    info is available to git.)  What should the mapping of git commit
>>    names to Subversion revision numbers that is stored in notes say
>>    in this case?
>>
>>  . detecting trunk and branches and exposing them as different remote
>>    branches.  This is a small step that just involves understanding
>>    how remote helpers expose branches.
>>
>>  . storing path properties and copyfrom information in the commits
>>    produced by the vcs-svn/ library.  How should these be stored?
>>    For example, there could be a parallel directory structure
>>    in the tree:
>>
>>       foo/
>>               bar.c
>>       baz/
>>               qux.c
>>       .properties/
>>               foo.properties
>>               foo/
>>                       bar.c.properties
>>               baz/
>>                       qux.c.properties
>>
>>    with properites for <path> stored at .properties/<path>.properties.
>>    This strawman scheme doesn't work if the repository being imported
>>    has any paths ending with ".properties", though.  Ideas?
>>
>>  . tracing history past branch creation events, using the now-saved
>>    copyfrom information.
>>
>>  . tracing second-parent history using svn:mergeinfo properties.
>>
>> In other words, in the above list the strategy is:
>>
>>  1. First convert the remote helper to C so it doesn't have to be
>>     translated again later.
>>
>>  2. Teach the remote helper to import a single project from a
>>     repository that houses multiple projects (i.e., path limiting).
>>
>>  3. Teach the remote helper to split an imported project that uses
>>     the standard layout into branches (an application of the code
>>     from (2)).  This complicates the scheme for mapping between
>>     Subversion revision numbers and git commit ids.
>>
>>  4. Teach the SVN dumpfile to fast-import stream converter not to
>>     lose the information that is needed in order to get parenthood
>>     information.
>>
>>  5. Use the information from step (4) to get parenthood right for a
>>     project split into branches.
>>
>>  6. Getting the second parent right (i.e., merges).  I mentioned
>>     this for fun but I don't expect there to be time for it.
>>
>> Does that seem right, or does it need tweaks?  How long would each
>> step take?  Can the steps be subdivided into smaller steps?
>>
>> Another question is: what is the design for this?  With the existing
>> remote-svn-alpha script, there are a few different components with
>> well defined interfaces:
>>
>>       commands like "git fetch"
>>
>>         | (1)
>>
>>       transport-helper --- (2) --- git fast-import
>>
>>         | (2, 3)                        |
>>
>>       remote-svn-alpha                  | (3)
>>
>>         |             ''..              |
>>         |
>>         | (2)             ''(2)..       |
>>         |
>>         |                        ''..   |
>>
>>       svnrdump --------- (3) -------- svn-fe
>>
>>  (1) communicates using function calls and shared data
>>  (2) launches
>>  (3) communicates over pipe
>>
>> Once remote-svn-alpha is rewritten in C, the same structure is still
>> present, though it might be less obvious because some of the (2)
>> and (3) can change into (1).
>>
>> Where does the functionality you are adding fit into this picture?
>> Are there any new components being added, and if so what do they take
>> as input and output?
>
> I planned to implement a remote-helper using the existing interface
> specification to communicate over pipes with git's transport-helper.
> Instead of invoking svn-fe as a subprocess, I want to call vcs-svn/ functions
> directly from the remote-helper and place new functions in this directory (?).
> To communicate with svn, the remote-helper launches svnrdump as a subprocess.
> Additionally the remote-helper will read a configuration file containing
> additional information about branch-mapping, this should be closely related to
> Andrew's SBL.
>
>>
>> Hope that helps,
>> Jonathan
>>
>> > [1] https://github.com/flyingflo/git/wiki/
>
> Florian
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]