On Tuesday 10 April 2012 12:17:07 Jonathan Nieder wrote: > Hi, > > Florian Achleitner wrote: > > Thanks for your inputs. I've now submitted a slightly updated version of > > my proposal to google. Additionally it's on github [1]. > > > > Summary of diffs: > > I'll concentrate on the fetching from svn, writing a remote helper > > without branch detection (like svn-fe) first, and then creating the > > branch mapper. > Thanks for the update. > > If I understand correctly, the remote helper from the first half would > do essentially the same thing as Dmitry's remote-svn-alpha script. > Since in shell script form it is very simple, I don't think it should > take more than a couple of days to write such a thing in C. If the remote-svn-alpha script is really all that needs to be done, you're right. It just pipes through svn-fe. I thought svn-fe could only import an svn repo initially, and there would be some difference between importing the whole history and fetching new revisions later, (?). > Via > > Timeline > > > > GSoC timeline and summer holidays > > Summer holidays in Austria at 9th of July. So until the mid-term > > evaluations my git project will have co-exist with my regular > > university work and projects. But holidays extend until the beginning > > of October, so there’s some time left to catch up after the official > > end of GSoC. > > Another possibility that some people in similar situations have > followed is to start early. That works a little better since it means > that by the time midterm evaluations come around we can have a > reasonable idea of whether a change in strategy is needed for the > project to finished on time. > > > I plan to split the project in two parts: > > > > Writing the remote helper using existing functions in vcs-svn to > > import svn history without detecting branches, like svn-fe does. > > Milestone: 9th of July, GSoC mid-term > > > > Writing a branch mapper for the remote helper that reads the config > > language (SBL) and imports branches trying to deal as good as possible > > with all the little pitfalls that will occur. Milestone: 20th of > > August, GSoC end > > Could you flesh out this timeline more? Ideally it would be nice to > have a definite plan here, even to the point of listing what patches > would need to be written, so during the summer all that would need to > happen is to execute and deal with bugs as they come. Listing patches and planing all details in the submitted proposal would require me to know what I do and how I will do it all before last Friday! As I'm not yet an expert on this topic, I don't know how I could have known all details a-priori. Of course the project's documentation will evolve outside the GSoC project proposal, which cannot be changed anymore. > > Given the goal described here of an import with support for > automatically detecting branches, here are some rough steps I imagine > would be involved: > > . baseline: remote helper in C > > . option to import starting with a particular numbered revision. > This would be good practice for seeing how options passed to > "git clone -c" can be read from the config file. > > . option or URL schema to import a single project from a large > Subversion repository that houses several projects. This would > already be useful in practice since importing the entire Apache > Software Foundation repository takes a while which is a waste > when one only wants the history of the Subversion project. > > How should the importer handle Subversion copy commands that > refer to other projects in this case? > > . automatically detecting trunk when importing a project with the > standard layout. The trunk usually is not branched from elsewhere > so this does not require copyfrom info. Some design questions > come up here: should the remote helper import the entire project > tree, too? (I think "yes", since copy commands that copy from > other branches are very common and that would ensure the relevant > info is available to git.) What should the mapping of git commit > names to Subversion revision numbers that is stored in notes say > in this case? > > . detecting trunk and branches and exposing them as different remote > branches. This is a small step that just involves understanding > how remote helpers expose branches. > > . storing path properties and copyfrom information in the commits > produced by the vcs-svn/ library. How should these be stored? > For example, there could be a parallel directory structure > in the tree: > > foo/ > bar.c > baz/ > qux.c > .properties/ > foo.properties > foo/ > bar.c.properties > baz/ > qux.c.properties > > with properites for <path> stored at .properties/<path>.properties. > This strawman scheme doesn't work if the repository being imported > has any paths ending with ".properties", though. Ideas? > > . tracing history past branch creation events, using the now-saved > copyfrom information. > > . tracing second-parent history using svn:mergeinfo properties. > > In other words, in the above list the strategy is: > > 1. First convert the remote helper to C so it doesn't have to be > translated again later. > > 2. Teach the remote helper to import a single project from a > repository that houses multiple projects (i.e., path limiting). > > 3. Teach the remote helper to split an imported project that uses > the standard layout into branches (an application of the code > from (2)). This complicates the scheme for mapping between > Subversion revision numbers and git commit ids. > > 4. Teach the SVN dumpfile to fast-import stream converter not to > lose the information that is needed in order to get parenthood > information. > > 5. Use the information from step (4) to get parenthood right for a > project split into branches. > > 6. Getting the second parent right (i.e., merges). I mentioned > this for fun but I don't expect there to be time for it. > > Does that seem right, or does it need tweaks? How long would each > step take? Can the steps be subdivided into smaller steps? > > Another question is: what is the design for this? With the existing > remote-svn-alpha script, there are a few different components with > well defined interfaces: > > commands like "git fetch" > > | (1) > > transport-helper --- (2) --- git fast-import > > | (2, 3) | > > remote-svn-alpha | (3) > > | ''.. | > | > | (2) ''(2).. | > | > | ''.. | > > svnrdump --------- (3) -------- svn-fe > > (1) communicates using function calls and shared data > (2) launches > (3) communicates over pipe > > Once remote-svn-alpha is rewritten in C, the same structure is still > present, though it might be less obvious because some of the (2) > and (3) can change into (1). > > Where does the functionality you are adding fit into this picture? > Are there any new components being added, and if so what do they take > as input and output? I planned to implement a remote-helper using the existing interface specification to communicate over pipes with git's transport-helper. Instead of invoking svn-fe as a subprocess, I want to call vcs-svn/ functions directly from the remote-helper and place new functions in this directory (?). To communicate with svn, the remote-helper launches svnrdump as a subprocess. Additionally the remote-helper will read a configuration file containing additional information about branch-mapping, this should be closely related to Andrew's SBL. > > Hope that helps, > Jonathan > > > [1] https://github.com/flyingflo/git/wiki/ Florian -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html