On Fri, Apr 8, 2011 at 11:21 AM, Jonathan Nieder <jrnieder@xxxxxxxxx> wrote: > (+cc: Eric who brought us git-svn) > Hi Dmitry, > > Dmitry Ivankov wrote: > >> This is the second iteration of my GSoC proposal > > Great; let's iron this out. > >> I would like to work on "Remote helper for Subversion and git-svn". >> My major motivation is to make git-svn repository easy to clone, and to make >> git-svn (fetch) faster on huge repositories. > > So, my new first impression is that this goal might make things hard[1]. > > I think replacing git-svn with an imperfect emulation would not leave > people happy. Existing configurations need to continue to work. I should have used different names for current git-svn.perl and what should be tracking svn repo in git somewhat better than git-svn.perl. Maybe call it git-svn-ng. It should definitely support common workflows, but I think that it should not be too close in configuration and behavior details. git-svn.perl is a personal setup and I doubt someone shares it somehow, so transition won't be hard. My focus is on git-svn-ng core operations - fast fetch, push, ability to clone at least upstream svn state from another git repo, and I see way to a complete replacement as follows: 1) introductory step, nothing new compared to git-svn.perl, but expected to be already faster on allowed operations git clone svn::/svnroot (svnroot can be a path in local or remote repo, this is already supported by svn layers) ..hack.. git svn dcommit or like - put changes to the remote 2) allow private clones, that is be able to exchange svn updates or do initial clone via git git clone /somewhere/git_ro_version_of_svn.git (maybe some additional command/key to get svn metadata) git remote add upstream svn://svnroot (git remote update should be as quick as on origin) ..hack.. git svn dcommit 3) allow tracking of a path, something like, intermediate, functionally the same or slightly different in corner cases git remote add svn svn:://svnroot/ (it's still a good idea to have whole root specified) git svn branch trunk svn/trunk@12 (at first it'll be ok to behave like as if root is svnroot/trunk, just a new syntax) 4) follow tracked path, create branches git remote add svn svn:://svnroot/ git svn branch trunk svn/trunk@12 git svn branch trunk svn/branches/stable@14 git remote update git merge-base svn/trunk svn/branches/stable #for example svn/trunk@13 git remote update #got svn/trunk@23 -> svn/branches/fixups (maybe if option discover is set; here we need the root to check that the destination is ok) git svn branch trunk svn/branches/old@13 (if somehow it wasn't discovered) git remote update (maybe, if we store whole root history, will be fast) and also create svn branches with git, ugly way is to: git checkout svnroot cp -r trunk branches/experimental git add branches/experimental git commit && git svn dcommit or maybe: git svn branch trunk svn/branches/experimental git checkout svn/branches/experimental git commit --allow-empty -m "create new branch" git svn dcommit These four imho are fine as a minimal functionality - one can use it for git-svn.perl replacement if perfomance and clones are more important than heuristics and easy of use. 5) tweak 3-4), allow tracking branches, sharing branches, committing merges, and a lot more, see below for an idea of getting rid of rebase. Uh, that became a long story, anyway, I clearly see that there is git-svn.perl which can do git<->svn interaction quite comfortable for users, there is almost ready faster git<->svn transport, there are already a bunch remote-helpers available on the good side, and on the bad it's currently hard to get even initial clone. So I consider it quite possible for a GSoC project to get a kind git-svn-ng that is cloneable, faster than git-svn.perl, and hopefully doesn't involve deep understanding and patching git-svn.perl. All this with the idea of extending it to handle git workflows between two git-svn-ng clones and a svn repo, or just better git workflows inside one git-svn-ng. > >> Project Goals: >> + * Design and create fully functional prototype of new git-svn which is >> cloneable and quite fast. > > *If* one does not have this goal ("new git-svn") then there is a > chance to move past some of git-svn's limitations[2]. I'll write inline in [2] > > All that said, these tools could be used to speed up git-svn. > >> By fully functional I mean that it'll be >> able to fetch, push, etc. but probably won't have automatic tags and >> branches discovery and like, but will allow it to be implemented on >> top. Oh, it just hit me that given a path (read trunk) to track and a >> svndump it looks trivial to discover all it's branches - just seek for >> copies. > > As mentioned before, this sounds very ambitious. Once we have a > timeline showing how this breaks down into small steps it should > hopefully be clearer way. Ok, I'll try to break it into some steps. > >> + * Get all the needed core git changes merged. > > The following is probably controversial. It's my opinion only. > > Since you can't control what other people do, I don't think it's right > to judge your project's success or failure based on whether it gets > merged. Put another way, the product of your work that can be judged > is not whatever fraction gets accepted in git.git by the end of the > summer[3]. That means one can't blame them if it's not merged, but also git is a mentor and that'd be strange to choose a goal like "write a thing I'll use myself" :) > So I think the goal is whatever it is (a working and suitable "git > clone svn://foo" command, say) and getting feedback by pushing changes > upstream and responding to it is a part of how that happens. That makes a great sense to keep both things in mind, yes :) > > At some point there will probably be a point of no return --- "if the > design of this patch is not right, I would have to rewrite everything > on top of a redesign of it". I'd encourage getting input on such > patches _very_ early and working hard to get them merged at least to > "next" (i.e., to have a rough consensus that they are suitable modulo > small tweaks). I would love it if the proposal included a timeline > pointing out some examples of this. > >> Some of these exist already and >> only need help with polishing, reviewing and merging. > > Do you mean support for parsing "svnadmin dump --deltas" output? It > is already polished and reviewed; it's only sitting out-of-tree for > now because it makes the commandline usage awkward and it would be > nice to merge some improvements to that at the same time. Yep, help in whatever way I can with this one and also I saw helpers branches introducing new remote-helpers commands or extending existing core functions. I hope all the needed core changes will quickly popup on early stages. > >> + * Make the prototype as close to being merged as possible. > > That's kind of vague, you know. :) Yep, but I don't know any good metric for this :) > >> Milestones for prototype functionality: > [list of features snipped] > > Could you say something about how you would go about implementing > these? > > Sorry for the ramble, and thanks for working on this. No problem absolutely, thank you for feedback, I like the challenges. > > Ciao, > Jonathan > > [1] git-svn.perl is a work of art and a wonder to behold, and if your aim > is to make a compatible replacement for it, the first step will be to > understand its design deeply. And the thing is, that much, while > valuable anyway, is pretty hard already. [skipped git-svn.perl heuristics] > and people rely a lot on an odd coincidence: > > - using "git svn clone" twice with the same configuration on the same > repository will, at least most of the time, give the same commit > names. I want this to happen always wherever two clone & fetch sequences reach the same remote revision. > > [2] Well, it mostly comes down to one limitation. To give a quick > sketch: > > If I clone a repository with "git svn", then I am in a way a > second-class citizen. The history shown with "git log" is filled > with "git-svn-id:" lines that are not very interesting to me (the > revision number is still interesting, of course). As already mentioned I didn't mean exact emulation of git-svn. Ideally I'd like git commit object to include only immutable svn data, and even more it should be the same after a round trip to the svn repo and back. > I cannot use > "git push" to push my work, and in fact I cannot push my work as a > branch reflecting the real development history at all --- I have to > rebase it at the same time as pushing. Whenever I push, the commit > names for my work change, so other branches based on my work don't > show up in "gitk" as based on my work any more. > > Wouldn't it be nicer to be able to do > > alice$ git clone svn::http://svn.apache.org/repos/asf/subversion > alice$ cd subversion > alice$ ... hack hack hack ... > > bob$ git clone 'alice:~/src/subversion' > bob$ cd subversion > bob$ ... hack hack hack ...; # make some changes on top of alice's work > > alice$ git fetch origin; # anything new upstream? > alice$ git push origin; # push my changes upstream > > bob$ git remote add upstream svn::http://svn.apache.org/repos/asf/subversion > bob$ git fetch upstream > bob$ # push my changes on top of alice's (which were already pushed): > bob$ git push upstream > > That is the dream. Because there is not a clearly appropriate > one-to-one mapping between possible svn histories and possible git > histories, there are going to have to be limitations[1], but that is > an ideal to strive for. I have an idea for it, quite raw, but could work. We are limited by svn: - commit isn't a push, it's rebase & push. So we don't control the parent ref. - there are no branches in svn. There are paths, but we can convert them to branches of another paths. - there are no merges in svn. That's a trouble, but maybe we can try to use svn:mergeinfo to create and read multiple parent refs. - we definitely want to keep everything needed inside svn, or we are sure to diverge in different clones sometime. So what we do: - get rid of svn revision: just have path@rev -> sha1 mapping separately, of course history of path@rev should look like the history of sha1 - learn to create and fetch back merge commits: try svn:mergeinfo - be sure to control the parents: don't let svn to commit on top of something different from git parent: -- if path wasn't changed in the repo while we were hacking, commit it and it'll come back as the same sha1 -- if it was, create a svn branch of our parent, commit there, and then create a merge commit of these two, commit it and get same merge history back -- and if we are commiting a merge, create/commit to branches as necessary Not perfect, but it hardly can be cleaner to emulate git history in svn, and get it back unchanged. And it should be optional too, not all svn commits need this. > Sounds hard, maybe? Yeah, it is, but getting at least fetch support > using the tools David and Ram made sounds easier to me than a fully > compatible replacement for git-svn. > > [3] Meanwhile, just writing and publishing code is not enough, since > the code might have a fatal flaw that means no one will use it ("ivory > tour syndrome"). So what do I mean by the above? > > As students work, I hope they will keep the mailing list posted on > their progress and find small pieces to review and merge early. In > response they might get some questions and suggestions for > improvement; the response to these is just as important as the code. > > On one hand this feedback is an important sanity check on the broad > features of your work and a means to get the details right for > inclusion in git (i.e., get it merged). On the other hand, one should > not be tempted by interesting side tracks and avoid getting the actual > project done; you have to be able to say "no, I will not be working on > that". Out of these conversations emerge better code and > documentation of the design in the form of list archives. > > See [4] for a better explanation of this workflow. > > [4] http://thread.gmane.org/gmane.comp.version-control.git/142623/focus=142877 > -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html