Re: Git Scaling: What factors most affect Git performance for a large repo?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/20/2015 03:25 PM, Ævar Arnfjörð Bjarmason wrote:
> On Fri, Feb 20, 2015 at 1:09 PM, Ævar Arnfjörð Bjarmason
> <avarab@xxxxxxxxx> wrote:
>> On Fri, Feb 20, 2015 at 1:04 AM, Duy Nguyen <pclouds@xxxxxxxxx> wrote:
>>> On Fri, Feb 20, 2015 at 6:29 AM, Ævar Arnfjörð Bjarmason
>>> <avarab@xxxxxxxxx> wrote:
>>>> Anecdotally I work on a repo at work (where I'm mostly "the Git guy") that's:
>>>>
>>>>  * Around 500k commits
>>>>  * Around 100k tags
>>>>  * Around 5k branches
>>>>  * Around 500 commits/day, almost entirely to the same branch
>>>>  * 1.5 GB .git checkout.
>>>>  * Mostly text source, but some binaries (we're trying to cut down[1] on those)
>>>
>>> Would be nice if you could make an anonymized version of this repo
>>> public. Working on a "real" large repo is better than an artificial
>>> one.
>>
>> Yeah, I'll try to do that.
> 
> tl;dr: After some more testing it turns out the performance issues we
> have are almost entirely due to the number of refs. Some of these I
> knew about and were obvious (e..g. git pull), but some aren't so
> obvious (why does "git log" without "--all" slow down as a function of
> the overall number of refs?).

I'm assuming that you pack your references periodically. (If not, you
should, because reading lots of loose references is very expensive for
the commands that need to iterate over all references!)

On the other hand, packed refs also have a downside, namely that
whenever even a single packed reference has to be read, the whole
packed-refs file has to be read and parsed. One way that this can bite
you, even with innocuous-seeming commands, is if you haven't disabled
the use of replace references (i.e., using "git --no-replace-objects
<CMD>" or GIT_NO_REPLACE_OBJECTS). In that case, almost any Git command
has to read the "refs/replace/*" namespace, which, in turn, forces the
whole packed-refs file to be read and parsed. This can take a
significant amount of time if you have a very large number of references.

So try your experiments with replace references disabled. If that helps,
consider disabling them on your server if you don't need them.

Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]