Re: Poor performance of git describe in big repos

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Alex Bennée <kernel-hacker@xxxxxxxxxx> writes:

> On 30 May 2013 16:33, Thomas Rast <trast@xxxxxxxxxxx> wrote:
>> Alex Bennée <kernel-hacker@xxxxxxxxxx> writes:
>>
>>>  41.58%   git  libcrypto.so.1.0.0  [.] sha1_block_data_order_ssse3
>>>  33.62%   git  libz.so.1.2.3.4     [.] inflate_fast
>>>  10.39%   git  libz.so.1.2.3.4     [.] adler32
>>>   2.03%   git  [kernel.kallsyms]   [k] clear_page_c
>>
>> Do you have any large blobs in the repo that are referenced directly by
>> a tag?
>
> Most probably. I've certainly done a bunch of releases (which are tagged) were
> the last thing that was updated was an FPGA image.
[...]
>> git-describe should probably be fixed to avoid loading blobs, though I'm
>> not sure off hand if we have any infrastructure to infer the type of a
>> loose object without inflating it.  (This could probably be added by
>> inflating only the first block.)  We do have this for packed objects, so
>> at least for packed repos there's a speedup to be had.
>
> Will it be loading the blob for every commit it traverses or just ones that hit
> a tag? Why does it need to load the blob at all? Surely the commit
> tree state doesn't
> need to be walked down?

No, my theory is that you tagged *the blobs*.  Git supports this.

git-describe needs to look at the commit (if any) obtained by peeling
each tag (i.e. dereferencing tags until it reaches a non-tag).  So to do
that, it resolves the tag's referent and loads it.  Usually this will be
a commit, in which case it is marked as reached by the tag.

As my example shows, it also resolves tags' referents if they refer to
non-commits, in particular, it will decompress large blobs that are
(directly) referenced by a tag.

Note that while annotated tags provide the type information themselves,
e.g.

  $ git cat-file tag junio-gpg-pub
  object fe113d3f96636710600c6b02d5fd421fa7e87dd6
  type blob
  tag junio-gpg-pub
  [...]

unannotated tags are simply refs, so it is not enough to just look at
the tag objects' referent type.

I had a brief look around sha1_file.c, in particular sha1_object_info,
and it turns out we lack the "deflate only early part" logic as I
suspected.  So that'll have to be fixed first.  After that I *think* it
should automatically carry over into the tag readers.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]