Re: Git Scaling: What factors most affect Git performance for a large repo?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 24, 2015 at 1:44 PM, Michael Haggerty <mhagger@xxxxxxxxxxxx> wrote:
> On 02/20/2015 03:25 PM, Ævar Arnfjörð Bjarmason wrote:
>> On Fri, Feb 20, 2015 at 1:09 PM, Ævar Arnfjörð Bjarmason
>> <avarab@xxxxxxxxx> wrote:
>>> On Fri, Feb 20, 2015 at 1:04 AM, Duy Nguyen <pclouds@xxxxxxxxx> wrote:
>>>> On Fri, Feb 20, 2015 at 6:29 AM, Ævar Arnfjörð Bjarmason
>>>> <avarab@xxxxxxxxx> wrote:
>>>>> Anecdotally I work on a repo at work (where I'm mostly "the Git guy") that's:
>>>>>
>>>>>  * Around 500k commits
>>>>>  * Around 100k tags
>>>>>  * Around 5k branches
>>>>>  * Around 500 commits/day, almost entirely to the same branch
>>>>>  * 1.5 GB .git checkout.
>>>>>  * Mostly text source, but some binaries (we're trying to cut down[1] on those)
>>>>
>>>> Would be nice if you could make an anonymized version of this repo
>>>> public. Working on a "real" large repo is better than an artificial
>>>> one.
>>>
>>> Yeah, I'll try to do that.
>>
>> tl;dr: After some more testing it turns out the performance issues we
>> have are almost entirely due to the number of refs. Some of these I
>> knew about and were obvious (e..g. git pull), but some aren't so
>> obvious (why does "git log" without "--all" slow down as a function of
>> the overall number of refs?).
>
> I'm assuming that you pack your references periodically. (If not, you
> should, because reading lots of loose references is very expensive for
> the commands that need to iterate over all references!)

Yes, as mentioned in another reply of mine, like this:

    git --git-dir={} gc &&
    git --git-dir={} pack-refs --all --prune &&
    git --git-dir={} repack -Ad --window=250 --depth=100
--write-bitmap-index --pack-kept-objects &&

> On the other hand, packed refs also have a downside, namely that
> whenever even a single packed reference has to be read, the whole
> packed-refs file has to be read and parsed. One way that this can bite
> you, even with innocuous-seeming commands, is if you haven't disabled
> the use of replace references (i.e., using "git --no-replace-objects
> <CMD>" or GIT_NO_REPLACE_OBJECTS). In that case, almost any Git command
> has to read the "refs/replace/*" namespace, which, in turn, forces the
> whole packed-refs file to be read and parsed. This can take a
> significant amount of time if you have a very large number of references.

Interesting. I tried the rough benchmarks I posted above with
GIT_NO_REPLACE_OBJECTS=1 and couldn't see any differences, although as
mentioned in another reply --no-decorate had a big effect on git-log.

> So try your experiments with replace references disabled. If that helps,
> consider disabling them on your server if you don't need them.
>
> Michael
>
> --
> Michael Haggerty
> mhagger@xxxxxxxxxxxx
>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]