Hi Florian, Florian Achleitner wrote: > Here is my draft of the proposal for the GSoC project. RFC! > Please comment and tell me what you think and if I understood it all right! I like the rough idea. I also agree with Ram that the scope seems too wide for one summer and think it would be useful to narrow the scope a little. Some tasks I can think of: - getting Dmitry's importer into contrib/ and making sure it works reliably. This might require some fixes to svnrdump, svn-fe, and the transport-helper. Some known problems that I suspect may be still unresolved: - files marked with both svn:special (symlink) and svn:executable - dealing with after-the-fact edits to the svn repository. For example, revprops including svn:log can be and often are changed after the fact. - what happens when the connection to the Subversion server is interrupted? The Subversion dump format does not have an "end of commit" marker so currently we can get confused and seem to succeed. - svn-fe does not correctly handle revs that change a text file to a symlink or vice versa without changing its text. - UI for importing only some revisions (e.g., "all revisions after r1000"). Dmitry has a patch for the svn-fe plumbing to handle this but I don't think the corresponding change for the remote helper has been written. - this would probably also require changes to svnrdump. What happens when r2000 involves copying a file from a version before r1000? If imports do not start at r0, normal dumps of r1000: are not self-contained. - UI for storing the mapping between Subversion revision numbers and git commit names in the git object db somewhere. Currently we store it in a marks file. There is a script floating around to convert that marks file into a set of commit notes and Dmitry also has a patch for svn-fe to make it write commit notes directly. What happens when the notes and marks file go out of sync --- which is authoritative? This also implies that repeated fetches would not have to start importing again at r1. - Storing empty directories and path-specific properties like svn:ignore that we don't currently handle. - Splitting history into branches. Somehow svn-fe has to communicate "svn cp" source and target information to the branch mapper so we can trace history to before the birth of the paths we are following. That is, the full history of branches/1.7.x/ includes the early history of trunk/ if the 1.7.x branch was originally created as a copy of the trunk. This might be able to use mechanism similar to storage of empty directories and path properties. - UI for importing only a subset of paths (e.g., "just the trunk"). - this would probably also require changes to svnrdump. What happens when r2000 involves copying a file from a branch we have chosen not to import? - Mapping authorship information from Subversion (which usually amounts to a remote username) to something more idiomatic in git (usually a human's name and email address) in a way that makes round trips possible. - Sharing an imported repository with other users of the remote helper. - this might involve changes to the remote helper machinery to allow new clones to use some fetch/push ref specification different from refs/heads/*:refs/remotes/origin/*, or it might involve some change to core git to automatically push notes corresponding to some refs in some situations. - Importing <rev, path> pairs that have multiple parents. In the subversion model, path nodes have only one (copyfrom) parent, but repositories can use the svn:mergeinfo property to indicate that changes made in certain revs to another patch have been incorporated. Under what circumstances is that enough justification to add a second parent on the git side? - Because svn:mergeinfo is a normal path property, the branch mapper could have enough information to take care of this with the help of the previously mentioned facility for storing path properties. All of the above is just for reasonable fetch support. For push support, one early problem to solve would be that pushing a commit so that the git commit id from re-importing it is the same requires permission to set the svn:date property. Is our target audience one that already has that permission? Is that permission something reasonable for a committer to ask for from the repository admin in order to use the remote helper? Because of the above: > 1. Write a new bi-directional remote helper in C. The word "new" makes me worried that you'd be throwing away whatever work already exists. :) [...] > { Hmm.. so it looks like thats a lot? what do you think? } I agree --- what you've described is more than one summer's worth of work. Are there any aspects you're particularly interested in focusing on? For example, (1) If we focus on repositories without any branching structure at all and where the user has full ability to write whatever she pleases to the repository, I think developing a bidirectional remote helper is feasible during the summer. Round-trip support (i.e., commit ids staying the same with a push followed by a fetch) is feasible with such a quick plan if we're willing to store some git-specific junk in the repo. (2) Regarding a tool that sits between svn-fe and the remote helper and implements the "follow parent" rule for tracing the full history of a single (linear) branch: I think developing that _and_ getting it merged could fit in the summer. (3) Regarding storing and sharing Subversion's path-specific and revision-specific properties: I think implementing a mechanism for that and getting it merged could fit in one summer. (4) Regarding getting git weirdness like distinct author and committer names, lack of rename information cooked at commit time, and timezones in author and committer dates handled during pushes to Subversion in a non-invasive way that is user-friendly for the pusher likely to be acceptable on the receiving side for normal projects: that could certainly fill a summer. (5) Subversion weirdness like revs that change the entire repository at once in a many-branch repo, non-standard file modes, and noticing and acting appropriately for svn:log messages that were changed after the fact could fill another summer. So ideally I would like 5 students working on the remote helper project. ;-) Hope that helps, Jonathan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html