Re: Poor performance of git describe in big repos

Alex Bennée <kernel-hacker@xxxxxxxxxx> · Fri, 31 May 2013 09:40:01 +0100

On 31 May 2013 09:24, Thomas Rast <trast@xxxxxxxxxxx> wrote:
> Alex Bennée <kernel-hacker@xxxxxxxxxx> writes:
>> On 30 May 2013 20:30, John Keeping <john@xxxxxxxxxxxxx> wrote:
>>> On Thu, May 30, 2013 at 06:21:55PM +0200, Thomas Rast wrote:
>>>> Alex Bennée <kernel-hacker@xxxxxxxxxx> writes:
>>>> > On 30 May 2013 16:33, Thomas Rast <trast@xxxxxxxxxxx> wrote:
> <snip>
>>>> No, my theory is that you tagged *the blobs*.  Git supports this.
>>
>> Wait is this the difference between annotated and non-annotated tags?
>> I thought a non-annotated just acted like references to a particular
>> tree state?
>
> A tag is just a ref.  It can point at anything, in particular also a
> blob (= some file *contents*).
>
> An annotated tag is just a tag pointing at a "tag object".  A tag object
> contains tagger name/email/date, a reference to an object, and a tag
> message.
>
> The slowness I found relates to having tags that point at blobs directly
> (unannotated).

I think you are right. I was brave (well I assumed the tags would come
back from the upstream repo) and ran:

git for-each-ref | grep "refs/tags" | grep "commit" | cut -d '/' -f 3
| xargs git tag -d

And boom:

09:19 ajb@sloy/x86_64 [work.git] >time /usr/bin/git --no-pager
describe --long --tags
ajb-build-test-5225-2-gdc0b771

real    0m0.009s
user    0m0.008s
sys     0m0.000s

Which is much better performance. So it does look like unannotated
tags pointing at binary blobs is the failure case.

<snip>
>
> I would be more interested in this:
>
>   git for-each-ref | grep ' blob'

Hmmm that gives nothing. All the refs are either tag or commit

> and
>
>   (git for-each-ref | grep ' blob' | cut -d\  -f1 | xargs -n1 git
>cat-file blob) | wc -c

However I have some big commits it seems:

09:37 ajb@sloy/x86_64 [work.git] >(git for-each-ref | grep ' commit' |
cut -d\  -f1 | xargs -n1 git cat-file commit) | wc -c
1147231984

>
> The first tells you if you have any refs pointing at blobs.  The second
> computes their total unpacked size.  My theory is that the second yields
> some large number (hundreds of megabytes at least).
>
> It would be nice if you checked, because if there turn out to be big
> blobs, we have all the pieces and just need to assemble the best
> solution.  Otherwise, there's something else going on and the problem
> remains open.

If you want any other numbers I'm only too happy to help. Sorry I
can't share the repo though...

-- 
Alex, homepage: http://www.bennee.com/~alex/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html