Re: [PATCH v4 00/13] New remote-hg helper

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Felipe Contreras venit, vidit, dixit 02.11.2012 19:01:
> On Fri, Nov 2, 2012 at 5:41 PM, Felipe Contreras
> <felipe.contreras@xxxxxxxxx> wrote:
>> On Fri, Nov 2, 2012 at 3:48 PM, Jeff King <peff@xxxxxxxx> wrote:
>>> On Thu, Nov 01, 2012 at 05:08:52AM +0100, Felipe Contreras wrote:
>>>
>>>>> Turns out msysgit's remote-hg is not exporting the whole repository,
>>>>> that's why it's faster =/
>>>>
>>>> It seems the reason is that it would only export to the point where
>>>> the branch is checked out. After updating the to the tip I noticed
>>>> there was a performance difference.
>>>>
>>>> I investigated and found two reasons:
>>>>
>>>> 1) msysgit's version doesn't export files twice, I've now implemented the same
>>>> 2) msysgit's version uses a very simple algorithm to find out file changes
>>>>
>>>> This second point causes msysgit to miss some file changes. Using the
>>>> same algorithm I get the same performance, but the output is not
>>>> correct.
>>>
>>> Do you have a test case that demonstrates this? It would be helpful for
>>> reviewers, but also helpful to msysgit people if they want to fix their
>>> implementation.
>>
>> Cloning the mercurial repo:
>>
>> % hg log --stat -r 131
>> changeset:   131:c9d51742471c
>> parent:      127:44538462d3c8
>> user:        jake@xxxxxxxxx
>> date:        Sat May 21 11:35:26 2005 -0700
>> summary:     moving hgweb to mercurial subdir
>>
>>  hgweb.py           |  377
>> ------------------------------------------------------------------------------------------
>>  mercurial/hgweb.py |  377
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 377 insertions(+), 377 deletions(-)
>>
>> % git show --stat 1f9bcfe7cc3d7af7b4533895181acd316ce172d8
>> commit 1f9bcfe7cc3d7af7b4533895181acd316ce172d8
>> Author: jake@xxxxxxxxx <none@none>
>> Date:   Sat May 21 11:35:26 2005 -0700
>>
>>     moving hgweb to mercurial subdir
>>
>>  mercurial/hgweb.py | 377
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 377 insertions(+)
> 
> I talked with some people in #mercurial, and apparently there is a
> concept of a 'changelog' that is supposed to store these changes, but
> since the format has changed, the content of it is unreliable. That's
> not a big problem because it's used mostly for reporting purposes
> (log, query), not for doing anything reliable.

Is the changelog stored in the repo (i.e. generated by the hg version at
commit time) or generated on the fly (i.e. generated by the hg version
at hand)? See also below.

> To reliably see the changes, one has to compare the 'manifest' of the
> revisions involved, which contain *all* the files in them.

'manifest' == '(exploded) tree', right? Just making sure my hg fu is not
subzero.

> That's what I was doing already, but I found a more efficient way to
> do it. msysGit is using the changelog, which is quite fast, but not
> reliable.
> 
> Unfortunately while going trough mercurial's code, I found an issue,
> and it turns out that 1) is not correct.
> 
> In mercurial, a file hash contains also the parent file nodes, which
> means that even if two files have the same content, they would not
> have the same hash, so there's no point in keeping track of them to
> avoid extracting the data unnecessarily, because in order to make sure
> they are different, you need to extract the data anyway, defeating the
> purpose.

Do I understand correctly that neither the msysgit version nor yours can
detect duplicate blobs (without requesting them) because of that sha1 issue?

I'm really wondering why a file blob hash carries its history along in
the sha1. This appears completely strange to gitters (being brain washed
about "content tracking"), but may be due to hg's extensive use of
delta, or really: delta chains (which do have their merit on the server
side).

> Which means mercurial doesn't really behave as one would expect:
> 
> # add files with the same content
> 
>  $ echo a > a
>   $ hg ci -Am adda
>   adding a
>   $ echo a >> a
>   $ hg ci -m changea
>   $ echo a > a
>   $ hg st --rev 0
>   $ hg ci -m reverta
>   $ hg log -G --template '{rev} {desc}\n'
>   @  2 reverta
>   |
>   o  1 changea
>   |
>   o  0 adda
> 
> # check the difference between the first and the last revision
> 
>   $ hg st --rev 0:2
>   M a
>   $ hg cat -r 0 a
>   a
>   $ hg cat -r 2 a
>   a

That is really scary. What use is "hg stat --rev" then? Not blaming you
for hg, of course.

On that tangent, I just noticed recently that hg has no python api.
Seriously [1]. They even tell us not to use the internal python api.
msysgit has been lacking support for newer hg, and you've had to add
support for older versions (hg 1.9 will be around on quite some
stable/LTS/EL distro releases) after developing on newer/current ones.
I'm wondering how well that scales in the long term (telling from
git-svn experience: it does not scale well), or whether using some
stable api like 'hgapi' would be a huge bottleneck.

Cheers,
Michael

[1] http://mercurial.selenic.com/wiki/MercurialApi

Really funny to see they recommend the command line as api ;)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]