On 01.04.2014, at 15:15, Jeff King <peff@xxxxxxxx> wrote: > On Tue, Apr 01, 2014 at 10:07:03PM +0900, Mike Hommey wrote: > >>> For my own curiosity, how does this differ from what is in >>> contrib/remote-helpers/git-remote-hg? >> >> contrib/remote-helpers/git-remote-hg does a local mercurial clone before >> doing the git conversion. While this is not really a problem for most >> mercurial projects, it tends to be slow with big ones, like the firefox >> source code. What I'm aiming at is something that can talk directly to a >> remote mercurial server. > > Ah, that makes sense. Thanks for explaining. Hm, myself, I am not quite convinced. Yes, there is an overhead, but it is one-time (well, the space overhead is not, but Mike only mentioned time, not space). I wonder if it is really worth the effort to start yet another project on this... Moreover, I don't see a fundamental reason why one could not modify git-remote-hg to work this way. At least optionally - myself, I would strongly prefer the current way, as translating between git and hg 100% round trip clean is provably impossible [1]. Thing is, there are by now more than half a dozen projects of this kind. In my impression, all do the low hanging fruit, some go slightly beyond that, but *none* solves all the tough parts and itty-gritty details... Just to mention a few of the problems that are usually ignored, even though they have real world impact: - the concept of Mercurial branches has no counterpart in git, making all kinds of translations hard. As a consequence, many translators ignore hg branches completely (e.g. hg-git -- at least it used to do that, not sure whether that changed) or handle them only partially (e.g. contrib/remote-helpers/git-remote-hg: It does not deal with multiple heads or with closed branches) (this can cause sever issues with git-remote-hg, by the way, with dangling refs, which, when pruned by an auto-gc, can wipe your fast-import marks file, causing major pain...) - in the other direction, git branches most closely correspond to hg bookmarks. But what if a hg repository has both a branch "foo" and a bookmark "foo"? git-remote-hg partially deals with that (by mapping the hg bookmark "foo" to the git branch "foo", and mapping the hg branch "foo" to the git branch "branches/foo"), but this still has issues (besides being annoying for users, it clearly still not avoids ref name conflicts) - git and hg also allow different characters sets in branch and bookmark names - in hg you can simultaneously have things called "foo" and "foo/bar". In git, you can't. There is plenty more. Of course, some of this might just be impossible [1] to handle nicely. But I find it kind of sad that everybody seems to prefer to start yet another solution, then leave it as 80%, instead of trying to improve upon existing work :-(. By the way, to get back to the speed bottleneck: We found that by far the slowest part in importing large repositories like the Mozilla one was not the initial cloning of the hg repository (althoug that could sometimes take ages) but rather an unfortunate mismatch between the hg and git storage approach. When creating a fast-import stream, the normal way to go about that is to import things commit by commit. But if you do that, then extracting file data from Mercurial and its revlog data format easily can degenerate into the worst case quadratic runtime :-/. Now, if one know that one is going to import the whole repository anyway, one could do better by first exporting all file revisions, generating many blobs and their marks, and keeping these in memory, *then* exporting the commits, reverting to these blob marks. However, this stops being a great idea once you are working in incremental mode. That said, it certainly would make sense to investigate this possibility (regardless of whether one uses a local hg clone or directly talks to the remote repository); at least in theory, even if one only uses this approach during the initial import, it should be a strict improvement over the current situation. In closing, I should mention that the problems caused by translating between hg and git concepts are by far not the only ones; the fast-import interface itself still has limitations that make some things annoying. E.g. when a remote is renamed, the remote handler does not know that, which can lead to awkward situations that right now may require some trickery to resolve correctly, if it is possible at all. Or if a user manually removed a commit that a remote-helper previously referenced in a marks file, and that remote helper than uses that marks file, fast-import just dies, complaining about the invalid mark. As a result, every proper remote helper basically would need to fully parse and verify those marks files, detect "broken" marks, and deal with that -- there is no way to benefit from the existing mark verification code in fast-import right now. Please don't get me wrong. I don't want to whine, and I hope I can contribute to solving some of these issues at some point (though lack of time is a nasty issue). In the meantime, I'd love if other people were interested in improving one of the existing solutions to the problem (such as git-remote-hg, gitifyhg or hg-git), instead of creating yet another half-way solution... :-) Cheers, Max [1] That is, unless you are willing to use a custom server, such as Kiln Harmony <http://blog.fogcreek.com/kiln-harmony-internals-the-basics/>. But that is cheating, as this is not a real round-trip conversion; rather, you keep a git and a hg repository in perfect sync all the time and present them as a single entity to the outside world.
Attachment:
signature.asc
Description: Message signed with OpenPGP using GPGMail