(+cc: Eric who brought us git-svn) Hi Dmitry, Dmitry Ivankov wrote: > This is the second iteration of my GSoC proposal Great; let's iron this out. > I would like to work on "Remote helper for Subversion and git-svn". > My major motivation is to make git-svn repository easy to clone, and to make > git-svn (fetch) faster on huge repositories. So, my new first impression is that this goal might make things hard[1]. I think replacing git-svn with an imperfect emulation would not leave people happy. Existing configurations need to continue to work. > Project Goals: > + * Design and create fully functional prototype of new git-svn which is > cloneable and quite fast. *If* one does not have this goal ("new git-svn") then there is a chance to move past some of git-svn's limitations[2]. All that said, these tools could be used to speed up git-svn. > By fully functional I mean that it'll be > able to fetch, push, etc. but probably won't have automatic tags and > branches discovery and like, but will allow it to be implemented on > top. Oh, it just hit me that given a path (read trunk) to track and a > svndump it looks trivial to discover all it's branches - just seek for > copies. As mentioned before, this sounds very ambitious. Once we have a timeline showing how this breaks down into small steps it should hopefully be clearer way. > + * Get all the needed core git changes merged. The following is probably controversial. It's my opinion only. Since you can't control what other people do, I don't think it's right to judge your project's success or failure based on whether it gets merged. Put another way, the product of your work that can be judged is not whatever fraction gets accepted in git.git by the end of the summer[3]. So I think the goal is whatever it is (a working and suitable "git clone svn://foo" command, say) and getting feedback by pushing changes upstream and responding to it is a part of how that happens. At some point there will probably be a point of no return --- "if the design of this patch is not right, I would have to rewrite everything on top of a redesign of it". I'd encourage getting input on such patches _very_ early and working hard to get them merged at least to "next" (i.e., to have a rough consensus that they are suitable modulo small tweaks). I would love it if the proposal included a timeline pointing out some examples of this. > Some of these exist already and > only need help with polishing, reviewing and merging. Do you mean support for parsing "svnadmin dump --deltas" output? It is already polished and reviewed; it's only sitting out-of-tree for now because it makes the commandline usage awkward and it would be nice to merge some improvements to that at the same time. > + * Make the prototype as close to being merged as possible. That's kind of vague, you know. :) > Milestones for prototype functionality: [list of features snipped] Could you say something about how you would go about implementing these? Sorry for the ramble, and thanks for working on this. Ciao, Jonathan [1] git-svn.perl is a work of art and a wonder to behold, and if your aim is to make a compatible replacement for it, the first step will be to understand its design deeply. And the thing is, that much, while valuable anyway, is pretty hard already. You see, "git svn" has heuristics for - matching up git history to svn history by reading commit messages; - pushing mergy history as linear history by rebasing internally (dcommit); - finding the branches, merges, branch renames, and so on in an imperfectly structured history (find_parent etc) - what particular paths are relevant (--ignore-paths) and maintains some of its own data in the repository: - a configuration scheme and wide variety of supported configurations; - a log for unhandled pieces of history; - a cache mapping svn revision numbers to git commits and people rely a lot on an odd coincidence: - using "git svn clone" twice with the same configuration on the same repository will, at least most of the time, give the same commit names. [2] Well, it mostly comes down to one limitation. To give a quick sketch: If I clone a repository with "git svn", then I am in a way a second-class citizen. The history shown with "git log" is filled with "git-svn-id:" lines that are not very interesting to me (the revision number is still interesting, of course). I cannot use "git push" to push my work, and in fact I cannot push my work as a branch reflecting the real development history at all --- I have to rebase it at the same time as pushing. Whenever I push, the commit names for my work change, so other branches based on my work don't show up in "gitk" as based on my work any more. Wouldn't it be nicer to be able to do alice$ git clone svn::http://svn.apache.org/repos/asf/subversion alice$ cd subversion alice$ ... hack hack hack ... bob$ git clone 'alice:~/src/subversion' bob$ cd subversion bob$ ... hack hack hack ...; # make some changes on top of alice's work alice$ git fetch origin; # anything new upstream? alice$ git push origin; # push my changes upstream bob$ git remote add upstream svn::http://svn.apache.org/repos/asf/subversion bob$ git fetch upstream bob$ # push my changes on top of alice's (which were already pushed): bob$ git push upstream That is the dream. Because there is not a clearly appropriate one-to-one mapping between possible svn histories and possible git histories, there are going to have to be limitations[1], but that is an ideal to strive for. Sounds hard, maybe? Yeah, it is, but getting at least fetch support using the tools David and Ram made sounds easier to me than a fully compatible replacement for git-svn. [3] Meanwhile, just writing and publishing code is not enough, since the code might have a fatal flaw that means no one will use it ("ivory tour syndrome"). So what do I mean by the above? As students work, I hope they will keep the mailing list posted on their progress and find small pieces to review and merge early. In response they might get some questions and suggestions for improvement; the response to these is just as important as the code. On one hand this feedback is an important sanity check on the broad features of your work and a means to get the details right for inclusion in git (i.e., get it merged). On the other hand, one should not be tempted by interesting side tracks and avoid getting the actual project done; you have to be able to say "no, I will not be working on that". Out of these conversations emerge better code and documentation of the design in the form of list archives. See [4] for a better explanation of this workflow. [4] http://thread.gmane.org/gmane.comp.version-control.git/142623/focus=142877 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html