On Sat, Apr 20, 2013 at 6:07 AM, John Szakmeister <john@xxxxxxxxxxxxxxx> wrote: > I really like the idea of remote-hg, but it appears to be awfully slow > on the clone step: The short answer is no. I do have a couple of patches that improve performance, but not by a huge factor. I have profiled the code, and there are two significant places where performance is wasted: 1) Fetching the file contents Extracting, decompressing, transferring, and then compressing and storing the file contents is mostly unavoidable, unless we already have the contents of such file, which in Git, it would be easy to check by analyzing the checksum (SHA-1). Unfortunately Mercurial doesn't have that information. The SHA-1 that is stored is not of the contents, but the contents and the parent checksum, which means that if you revert a modification you made to a file, or move a file, any operation that ends up in the same contents, but from a different path, the SHA-1 is different. This means the only way to know if the contents are the same, is by extracting, and calculating the SHA-1 yourself, which defeats the purpose of what you want the calculation for. I've tried, calculating the SHA-1 and use a previous reference to avoid the transfer, or do the transfer, and let Git check for existing objects doesn't make a difference. This is by Mercurial's stupid design, and there's nothing we, or anybody could do about it until they change it. 2) Checking for file changes For each commit (or revision), we need to figure out which files were modified, and for that, Mercurial has a neat shortcut that stores such modifications in the commit context itself, so it's easy to retrieve. Unfortunately, it's sometimes wrong. Since the Mercurial tools never use this information for any real work, simply to show the changes to the users, Mercurial folks never noticed the contents they were storing were wrong. Which means if you have a repository that started with old versions of mercurial, chances are this information would be wrong, and there's no real guarantee that future versions won't have this problem, since to this day this information continues to be used only display stuff to the user. So, since we cannot rely on this, we need to manually check for differences the way Mercurial does, which blows performance away, because you need to get the contents of the two parent revisions, and compare them away. My content I mean the the manifest, or list of files, which takes considerable amount of time. For 1) there's nothing we can do, and for 2) we could trust the files Mercurial thinks were modified, and that gives us a very significant boost, but the repository will sometimes end up wrong. Most of the time is spent on 2). So unfortunately there's nothing we can do, that's just Mercurial design, and it really has nothing to do with Git. Any other tool would have the same problems, even a tool that converts a Mercurial repository to Mercurial (without using tricks). It seems Bazaar is more sensible in this regard; 1) the checksums are try of the file contents, and 2) each revision does store the file modifications correctly. So a clone in Bazaar is much faster. In my opinion Mercurial just screwed up their design. Cheers. -- Felipe Contreras -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html