Felipe Contreras venit, vidit, dixit 02.11.2012 19:01: > On Fri, Nov 2, 2012 at 5:41 PM, Felipe Contreras > <felipe.contreras@xxxxxxxxx> wrote: >> On Fri, Nov 2, 2012 at 3:48 PM, Jeff King <peff@xxxxxxxx> wrote: >>> On Thu, Nov 01, 2012 at 05:08:52AM +0100, Felipe Contreras wrote: >>> >>>>> Turns out msysgit's remote-hg is not exporting the whole repository, >>>>> that's why it's faster =/ >>>> >>>> It seems the reason is that it would only export to the point where >>>> the branch is checked out. After updating the to the tip I noticed >>>> there was a performance difference. >>>> >>>> I investigated and found two reasons: >>>> >>>> 1) msysgit's version doesn't export files twice, I've now implemented the same >>>> 2) msysgit's version uses a very simple algorithm to find out file changes >>>> >>>> This second point causes msysgit to miss some file changes. Using the >>>> same algorithm I get the same performance, but the output is not >>>> correct. >>> >>> Do you have a test case that demonstrates this? It would be helpful for >>> reviewers, but also helpful to msysgit people if they want to fix their >>> implementation. >> >> Cloning the mercurial repo: >> >> % hg log --stat -r 131 >> changeset: 131:c9d51742471c >> parent: 127:44538462d3c8 >> user: jake@xxxxxxxxx >> date: Sat May 21 11:35:26 2005 -0700 >> summary: moving hgweb to mercurial subdir >> >> hgweb.py | 377 >> ------------------------------------------------------------------------------------------ >> mercurial/hgweb.py | 377 >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> 2 files changed, 377 insertions(+), 377 deletions(-) >> >> % git show --stat 1f9bcfe7cc3d7af7b4533895181acd316ce172d8 >> commit 1f9bcfe7cc3d7af7b4533895181acd316ce172d8 >> Author: jake@xxxxxxxxx <none@none> >> Date: Sat May 21 11:35:26 2005 -0700 >> >> moving hgweb to mercurial subdir >> >> mercurial/hgweb.py | 377 >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 377 insertions(+) > > I talked with some people in #mercurial, and apparently there is a > concept of a 'changelog' that is supposed to store these changes, but > since the format has changed, the content of it is unreliable. That's > not a big problem because it's used mostly for reporting purposes > (log, query), not for doing anything reliable. Is the changelog stored in the repo (i.e. generated by the hg version at commit time) or generated on the fly (i.e. generated by the hg version at hand)? See also below. > To reliably see the changes, one has to compare the 'manifest' of the > revisions involved, which contain *all* the files in them. 'manifest' == '(exploded) tree', right? Just making sure my hg fu is not subzero. > That's what I was doing already, but I found a more efficient way to > do it. msysGit is using the changelog, which is quite fast, but not > reliable. > > Unfortunately while going trough mercurial's code, I found an issue, > and it turns out that 1) is not correct. > > In mercurial, a file hash contains also the parent file nodes, which > means that even if two files have the same content, they would not > have the same hash, so there's no point in keeping track of them to > avoid extracting the data unnecessarily, because in order to make sure > they are different, you need to extract the data anyway, defeating the > purpose. Do I understand correctly that neither the msysgit version nor yours can detect duplicate blobs (without requesting them) because of that sha1 issue? I'm really wondering why a file blob hash carries its history along in the sha1. This appears completely strange to gitters (being brain washed about "content tracking"), but may be due to hg's extensive use of delta, or really: delta chains (which do have their merit on the server side). > Which means mercurial doesn't really behave as one would expect: > > # add files with the same content > > $ echo a > a > $ hg ci -Am adda > adding a > $ echo a >> a > $ hg ci -m changea > $ echo a > a > $ hg st --rev 0 > $ hg ci -m reverta > $ hg log -G --template '{rev} {desc}\n' > @ 2 reverta > | > o 1 changea > | > o 0 adda > > # check the difference between the first and the last revision > > $ hg st --rev 0:2 > M a > $ hg cat -r 0 a > a > $ hg cat -r 2 a > a That is really scary. What use is "hg stat --rev" then? Not blaming you for hg, of course. On that tangent, I just noticed recently that hg has no python api. Seriously [1]. They even tell us not to use the internal python api. msysgit has been lacking support for newer hg, and you've had to add support for older versions (hg 1.9 will be around on quite some stable/LTS/EL distro releases) after developing on newer/current ones. I'm wondering how well that scales in the long term (telling from git-svn experience: it does not scale well), or whether using some stable api like 'hgapi' would be a huge bottleneck. Cheers, Michael [1] http://mercurial.selenic.com/wiki/MercurialApi Really funny to see they recommend the command line as api ;) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html