Re: Performance of "git gc..." is extremely bad in some cases

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2021-03-08 at 22:29:16, Bryan Turner wrote:
> On Mon, Mar 8, 2021 at 1:32 PM Anthony Muller <anthony@xxxxxxxxxxxx> wrote:
> >
> > What did you do before the bug happened? (Steps to reproduce your issue)
> >
> > git clone https://github.com/notracking/hosts-blocklists
> > cd hosts-blocklists
> > git reflog expire --all --expire=now && git gc --prune=now --aggressive
> 
> --aggressive tells git gc to discard all of its existing delta chains
> and go find new ones, and to be fairly aggressive in how it looks for
> candidates. This is going to be the primary source of the resource
> usage you see, as well as the time.
> 
> Aggressive GCs are something you do once in a (very great) while. If
> you try this without the --aggressive, how does it look?

I should point out that this repository is also rather pathologically
structured.  Almost every commit is an automatic commit updating the
same five files which are text files ranging from 5 MB to 11 MB.

When you use --aggressive, as Bryan pointed out, you're asking to throw
away all the deltas and try really hard to compute all of them fresh.
That's going to use a lot of memory because you're loading many large
text files into memory.  It's also going to use a lot of CPU because
these files do indeed delta extremely well, and since computing deltas
on larger files is more expensive, especially when there are many of
them.

And that's just the blobs.  The trees and commits are also going to be
nearly identically structured and will also delta well with virtually
every other similar object of their type.  Normally Git sorts by size
which helps pick better candidates, but since these are all going to be
identically sized, the performance is going to suffer.

Now, I have the advantage in this case of being a person who's sometimes
on call for the maintenance of Git repositories and in that capacity,
that this is pathologically structured is obvious to me.  But, yeah, I
would definitely not run --aggressive on this repo unless I needed to
and I would not expect it to perform well.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux