On Thu, Apr 12, 2012 at 1:20 AM, Florian Achleitner <florian.achleitner2.6.31@xxxxxxxxx> wrote: > On Tuesday 10 April 2012 12:17:07 Jonathan Nieder wrote: >> Hi, >> >> Florian Achleitner wrote: >> > Thanks for your inputs. I've now submitted a slightly updated version of >> > my proposal to google. Additionally it's on github [1]. >> > >> > Summary of diffs: >> > I'll concentrate on the fetching from svn, writing a remote helper >> > without branch detection (like svn-fe) first, and then creating the >> > branch mapper. >> Thanks for the update. >> >> If I understand correctly, the remote helper from the first half would >> do essentially the same thing as Dmitry's remote-svn-alpha script. >> Since in shell script form it is very simple, I don't think it should >> take more than a couple of days to write such a thing in C. > > If the remote-svn-alpha script is really all that needs to be done, you're > right. It just pipes through svn-fe. I thought svn-fe could only import an svn > repo initially, and there would be some difference between importing the whole > history and fetching new revisions later, (?). I've already forgotten the exact details, but svnrdump --incremental from r0 to rX and then from rX+1 to Y is the same (modulo small dump header) as from r0 to rY. And svn-fe is able to continue like this too (maybe some bits of this are not merged, sadly I've forgotten this too). A side note is that svnrdump can't do the same trick for rZ, Z>1 (that's shallow clone) starting point as --incremental may produce delta references to rX, X<Z. So svnrdump rZ..rY is ok, but it's impossible to continue this with svnrdump --incremental rY+1..rX. Though it probably is not too hard to fix from inside svnrdump (disable deltas agains given"old"-threshold revs) or if the helper becomes very smart about partial history import it may be done from outside svnrdump, obviously via calling a new svnrdump request for the needed data and somehow glueing it together. > >> Via >> > Timeline >> > >> > GSoC timeline and summer holidays >> > Summer holidays in Austria at 9th of July. So until the mid-term >> > evaluations my git project will have co-exist with my regular >> > university work and projects. But holidays extend until the beginning >> > of October, so there’s some time left to catch up after the official >> > end of GSoC. >> >> Another possibility that some people in similar situations have >> followed is to start early. That works a little better since it means >> that by the time midterm evaluations come around we can have a >> reasonable idea of whether a change in strategy is needed for the >> project to finished on time. >> >> > I plan to split the project in two parts: >> > >> > Writing the remote helper using existing functions in vcs-svn to >> > import svn history without detecting branches, like svn-fe does. >> > Milestone: 9th of July, GSoC mid-term >> > >> > Writing a branch mapper for the remote helper that reads the config >> > language (SBL) and imports branches trying to deal as good as possible >> > with all the little pitfalls that will occur. Milestone: 20th of >> > August, GSoC end >> >> Could you flesh out this timeline more? Ideally it would be nice to >> have a definite plan here, even to the point of listing what patches >> would need to be written, so during the summer all that would need to >> happen is to execute and deal with bugs as they come. > > Listing patches and planing all details in the submitted proposal would > require me to know what I do and how I will do it all before last Friday! As > I'm not yet an expert on this topic, I don't know how I could have known all > details a-priori. > Of course the project's documentation will evolve outside the GSoC project > proposal, which cannot be changed anymore. > >> >> Given the goal described here of an import with support for >> automatically detecting branches, here are some rough steps I imagine >> would be involved: >> >> . baseline: remote helper in C >> >> . option to import starting with a particular numbered revision. >> This would be good practice for seeing how options passed to >> "git clone -c" can be read from the config file. >> >> . option or URL schema to import a single project from a large >> Subversion repository that houses several projects. This would >> already be useful in practice since importing the entire Apache >> Software Foundation repository takes a while which is a waste >> when one only wants the history of the Subversion project. >> >> How should the importer handle Subversion copy commands that >> refer to other projects in this case? >> >> . automatically detecting trunk when importing a project with the >> standard layout. The trunk usually is not branched from elsewhere >> so this does not require copyfrom info. Some design questions >> come up here: should the remote helper import the entire project >> tree, too? (I think "yes", since copy commands that copy from >> other branches are very common and that would ensure the relevant >> info is available to git.) What should the mapping of git commit >> names to Subversion revision numbers that is stored in notes say >> in this case? >> >> . detecting trunk and branches and exposing them as different remote >> branches. This is a small step that just involves understanding >> how remote helpers expose branches. >> >> . storing path properties and copyfrom information in the commits >> produced by the vcs-svn/ library. How should these be stored? >> For example, there could be a parallel directory structure >> in the tree: >> >> foo/ >> bar.c >> baz/ >> qux.c >> .properties/ >> foo.properties >> foo/ >> bar.c.properties >> baz/ >> qux.c.properties >> >> with properites for <path> stored at .properties/<path>.properties. >> This strawman scheme doesn't work if the repository being imported >> has any paths ending with ".properties", though. Ideas? >> >> . tracing history past branch creation events, using the now-saved >> copyfrom information. >> >> . tracing second-parent history using svn:mergeinfo properties. >> >> In other words, in the above list the strategy is: >> >> 1. First convert the remote helper to C so it doesn't have to be >> translated again later. >> >> 2. Teach the remote helper to import a single project from a >> repository that houses multiple projects (i.e., path limiting). >> >> 3. Teach the remote helper to split an imported project that uses >> the standard layout into branches (an application of the code >> from (2)). This complicates the scheme for mapping between >> Subversion revision numbers and git commit ids. >> >> 4. Teach the SVN dumpfile to fast-import stream converter not to >> lose the information that is needed in order to get parenthood >> information. >> >> 5. Use the information from step (4) to get parenthood right for a >> project split into branches. >> >> 6. Getting the second parent right (i.e., merges). I mentioned >> this for fun but I don't expect there to be time for it. >> >> Does that seem right, or does it need tweaks? How long would each >> step take? Can the steps be subdivided into smaller steps? >> >> Another question is: what is the design for this? With the existing >> remote-svn-alpha script, there are a few different components with >> well defined interfaces: >> >> commands like "git fetch" >> >> | (1) >> >> transport-helper --- (2) --- git fast-import >> >> | (2, 3) | >> >> remote-svn-alpha | (3) >> >> | ''.. | >> | >> | (2) ''(2).. | >> | >> | ''.. | >> >> svnrdump --------- (3) -------- svn-fe >> >> (1) communicates using function calls and shared data >> (2) launches >> (3) communicates over pipe >> >> Once remote-svn-alpha is rewritten in C, the same structure is still >> present, though it might be less obvious because some of the (2) >> and (3) can change into (1). >> >> Where does the functionality you are adding fit into this picture? >> Are there any new components being added, and if so what do they take >> as input and output? > > I planned to implement a remote-helper using the existing interface > specification to communicate over pipes with git's transport-helper. > Instead of invoking svn-fe as a subprocess, I want to call vcs-svn/ functions > directly from the remote-helper and place new functions in this directory (?). > To communicate with svn, the remote-helper launches svnrdump as a subprocess. > Additionally the remote-helper will read a configuration file containing > additional information about branch-mapping, this should be closely related to > Andrew's SBL. > >> >> Hope that helps, >> Jonathan >> >> > [1] https://github.com/flyingflo/git/wiki/ > > Florian -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html