Re: upload-pack is slow with lots of refs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 4, 2012 at 1:21 AM, Jeff King <peff@xxxxxxxx> wrote:
> On Thu, Oct 04, 2012 at 12:32:35AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> On Wed, Oct 3, 2012 at 8:03 PM, Jeff King <peff@xxxxxxxx> wrote:
>> > What version of git are you using?  In the past year or so, I've made
>> > several tweaks to speed up large numbers of refs, including:
>> >
>> >   - cff38a5 (receive-pack: eliminate duplicate .have refs, v1.7.6); note
>> >     that this only helps if they are being pulled in by an alternates
>> >     repo. And even then, it only helps if they are mostly duplicates;
>> >     distinct ones are still O(n^2).
>> >
>> >   - 7db8d53 (fetch-pack: avoid quadratic behavior in remove_duplicates)
>> >     a0de288 (fetch-pack: avoid quadratic loop in filter_refs)
>> >     Both in v1.7.11. I think there is still a potential quadratic loop
>> >     in mark_complete()
>> >
>> >   - 90108a2 (upload-pack: avoid parsing tag destinations)
>> >     926f1dd (upload-pack: avoid parsing objects during ref advertisement)
>> >     Both in v1.7.10. Note that tag objects are more expensive to
>> >     advertise than commits, because we have to load and peel them.
>> >
>> > Even with those patches, though, I found that it was something like ~2s
>> > to advertise 100,000 refs.
>>
>> FWIW I bisected between 1.7.9 and 1.7.10 and found that the point at
>> which it went from 1.5/s to 2.5/s upload-pack runs on the pathological
>> git.git repository was none of those, but:
>>
>>     ccdc6037fe - parse_object: try internal cache before reading object db
>
> Ah, yeah, I forgot about that one. That implies that you have a lot of
> refs pointing to the same objects (since the benefit of that commit is
> to avoid reading from disk when we have already seen it).
>
> Out of curiosity, what does your repo contain? I saw a lot of speedup
> with that commit because my repos are big object stores, where we have
> the same duplicated tag refs for every fork of the repo.

Things are much faster with your monkeypatch, got up to around 10
runs/s.

The repository mainly contains a lot of git-deploy[1] generated tags
which are added for every rollout to several subsystems.

Of the ~50k references in the repo 75% point to a commit that no other
reference points to. Around 98% of the references are annotated tags,
the rest are branches.

1. https://github.com/git-deploy/git-deploy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]