On Sat, Apr 20, 2013 at 7:07 PM, Felipe Contreras <felipe.contreras@xxxxxxxxx> wrote: > On Sat, Apr 20, 2013 at 6:07 AM, John Szakmeister <john@xxxxxxxxxxxxxxx> wrote: >> I really like the idea of remote-hg, but it appears to be awfully slow >> on the clone step: > > The short answer is no. I do have a couple of patches that improve > performance, but not by a huge factor. > > I have profiled the code, and there are two significant places where > performance is wasted: > > 1) Fetching the file contents > > Extracting, decompressing, transferring, and then compressing and > storing the file contents is mostly unavoidable, unless we already > have the contents of such file, which in Git, it would be easy to > check by analyzing the checksum (SHA-1). Unfortunately Mercurial > doesn't have that information. The SHA-1 that is stored is not of the > contents, but the contents and the parent checksum, which means that > if you revert a modification you made to a file, or move a file, any > operation that ends up in the same contents, but from a different > path, the SHA-1 is different. This means the only way to know if the > contents are the same, is by extracting, and calculating the SHA-1 > yourself, which defeats the purpose of what you want the calculation > for. > > I've tried, calculating the SHA-1 and use a previous reference to > avoid the transfer, or do the transfer, and let Git check for existing > objects doesn't make a difference. > > This is by Mercurial's stupid design, and there's nothing we, or > anybody could do about it until they change it. That's a bummer. :-( > 2) Checking for file changes > > For each commit (or revision), we need to figure out which files were > modified, and for that, Mercurial has a neat shortcut that stores such > modifications in the commit context itself, so it's easy to retrieve. > Unfortunately, it's sometimes wrong. > > Since the Mercurial tools never use this information for any real > work, simply to show the changes to the users, Mercurial folks never > noticed the contents they were storing were wrong. Which means if you > have a repository that started with old versions of mercurial, chances > are this information would be wrong, and there's no real guarantee > that future versions won't have this problem, since to this day this > information continues to be used only display stuff to the user. > > So, since we cannot rely on this, we need to manually check for > differences the way Mercurial does, which blows performance away, > because you need to get the contents of the two parent revisions, and > compare them away. My content I mean the the manifest, or list of > files, which takes considerable amount of time. Eek! > For 1) there's nothing we can do, and for 2) we could trust the files > Mercurial thinks were modified, and that gives us a very significant > boost, but the repository will sometimes end up wrong. Most of the > time is spent on 2). > > So unfortunately there's nothing we can do, that's just Mercurial > design, and it really has nothing to do with Git. Any other tool would > have the same problems, even a tool that converts a Mercurial > repository to Mercurial (without using tricks). [snip] That's unfortunate, but thank you for taking the time to explain! -John -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html