Re: [PATCH] Bump core.deltaBaseCacheLimit to 96m

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Duy Nguyen <pclouds@xxxxxxxxx> writes:

> On Mon, May 5, 2014 at 12:13 AM, David Kastrup <dak@xxxxxxx> wrote:
>> The default of 16m causes serious thrashing for large delta chains
>> combined with large files.
>>
>> Here are some benchmarks (pu variant of git blame):
>>
>> time git blame -C src/xdisp.c >/dev/null
>
> ...
>
>> diff --git a/Documentation/config.txt b/Documentation/config.txt
>> index 1932e9b..21a3c86 100644
>> --- a/Documentation/config.txt
>> +++ b/Documentation/config.txt
>> @@ -489,7 +489,7 @@ core.deltaBaseCacheLimit::
>>         to avoid unpacking and decompressing frequently used base
>>         objects multiple times.
>>  +
>> -Default is 16 MiB on all platforms.  This should be reasonable
>> +Default is 96 MiB on all platforms.  This should be reasonable
>>  for all users/operating systems, except on the largest projects.
>>  You probably do not need to adjust this value.
>
> So emacs.git falls exactly into the "except on the largest projects"
> part.

git gc --aggressive has been used/recommended for _all_ projects
regularly, leading to delta chains with a length of 250.  So this delta
chain size is not exceptional but will eventually occur in any archive
that has been created and maintained according to the recommendations of
Git's documentation (which recommends gc --aggressive every few hundreds
of revisions).  I was illustrating the effect on a file of size 1MB.
That's not an egregiously large file either.

96MB is the point of diminuishing returns for this case which is _6_
times larger than the current default and _small_ in comparison with the
memory installed on developer machines nowadays.  Similar slowdowns
occur with other examples.  Git will with the current defaults accept
files of 512Mb size into its compression scheme (and thus its core
memory) before punting.

The current delteBaseCacheLimit of 16Mb is rather ridiculous in
particular with the pre-2.0 settings for gc --aggressive and causes
serious performance degration.  It was actually ridiculous even 10 years
ago.

> Would it make more sense to advise git devs to set this per repo
> instead? The majority of (open source) repositories out there are
> small if I'm not mistaken. Of those few big repos, we could have a
> section listing all the tips and tricks to tune git. This is one of
> them. Index v4 and sparse checkout are some other. In future, maybe
> watchman support, split index and untracked cache as well.

Shrug.  The last version of the patch was refused because of wanting
more evidence.  I added the evidence.

And I have it on record in the mailing list and can point to it when
people ask me why Git is so slow for "git blame" in comparison to other
version control systems in spite of my purporting to having improved it.

I'm definitely not going to jump through any more hoops here.  I don't
see a point in this kind of spectacle.

-- 
David Kastrup
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]