Re: Performance of "git gc..." is extremely bad in some cases

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thank you Brian and Bryan. You both clarified what was happening and now I know what to look for.

I can use a shallow clone for most repos, but there are some I want to keep history for. I don't need a full copy of this repo, but it was a good repo to show the issue I was facing.

Thanks again!


 ---- On Mon, 08 Mar 2021 23:56:53 +0000 brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx> wrote ----
 > On 2021-03-08 at 22:29:16, Bryan Turner wrote:
 > > On Mon, Mar 8, 2021 at 1:32 PM Anthony Muller <anthony@xxxxxxxxxxxx> wrote:
 > > >
 > > > What did you do before the bug happened? (Steps to reproduce your issue)
 > > >
 > > > git clone https://github.com/notracking/hosts-blocklists
 > > > cd hosts-blocklists
 > > > git reflog expire --all --expire=now && git gc --prune=now --aggressive
 > > 
 > > --aggressive tells git gc to discard all of its existing delta chains
 > > and go find new ones, and to be fairly aggressive in how it looks for
 > > candidates. This is going to be the primary source of the resource
 > > usage you see, as well as the time.
 > > 
 > > Aggressive GCs are something you do once in a (very great) while. If
 > > you try this without the --aggressive, how does it look?
 > 
 > I should point out that this repository is also rather pathologically
 > structured.  Almost every commit is an automatic commit updating the
 > same five files which are text files ranging from 5 MB to 11 MB.
 > 
 > When you use --aggressive, as Bryan pointed out, you're asking to throw
 > away all the deltas and try really hard to compute all of them fresh.
 > That's going to use a lot of memory because you're loading many large
 > text files into memory.  It's also going to use a lot of CPU because
 > these files do indeed delta extremely well, and since computing deltas
 > on larger files is more expensive, especially when there are many of
 > them.
 > 
 > And that's just the blobs.  The trees and commits are also going to be
 > nearly identically structured and will also delta well with virtually
 > every other similar object of their type.  Normally Git sorts by size
 > which helps pick better candidates, but since these are all going to be
 > identically sized, the performance is going to suffer.
 > 
 > Now, I have the advantage in this case of being a person who's sometimes
 > on call for the maintenance of Git repositories and in that capacity,
 > that this is pathologically structured is obvious to me.  But, yeah, I
 > would definitely not run --aggressive on this repo unless I needed to
 > and I would not expect it to perform well.
 > -- 
 > brian m. carlson (he/him or they/them)
 > Houston, Texas, US
 > 



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux