Thank you Brian and Bryan. You both clarified what was happening and now I know what to look for. I can use a shallow clone for most repos, but there are some I want to keep history for. I don't need a full copy of this repo, but it was a good repo to show the issue I was facing. Thanks again! ---- On Mon, 08 Mar 2021 23:56:53 +0000 brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx> wrote ---- > On 2021-03-08 at 22:29:16, Bryan Turner wrote: > > On Mon, Mar 8, 2021 at 1:32 PM Anthony Muller <anthony@xxxxxxxxxxxx> wrote: > > > > > > What did you do before the bug happened? (Steps to reproduce your issue) > > > > > > git clone https://github.com/notracking/hosts-blocklists > > > cd hosts-blocklists > > > git reflog expire --all --expire=now && git gc --prune=now --aggressive > > > > --aggressive tells git gc to discard all of its existing delta chains > > and go find new ones, and to be fairly aggressive in how it looks for > > candidates. This is going to be the primary source of the resource > > usage you see, as well as the time. > > > > Aggressive GCs are something you do once in a (very great) while. If > > you try this without the --aggressive, how does it look? > > I should point out that this repository is also rather pathologically > structured. Almost every commit is an automatic commit updating the > same five files which are text files ranging from 5 MB to 11 MB. > > When you use --aggressive, as Bryan pointed out, you're asking to throw > away all the deltas and try really hard to compute all of them fresh. > That's going to use a lot of memory because you're loading many large > text files into memory. It's also going to use a lot of CPU because > these files do indeed delta extremely well, and since computing deltas > on larger files is more expensive, especially when there are many of > them. > > And that's just the blobs. The trees and commits are also going to be > nearly identically structured and will also delta well with virtually > every other similar object of their type. Normally Git sorts by size > which helps pick better candidates, but since these are all going to be > identically sized, the performance is going to suffer. > > Now, I have the advantage in this case of being a person who's sometimes > on call for the maintenance of Git repositories and in that capacity, > that this is pathologically structured is obvious to me. But, yeah, I > would definitely not run --aggressive on this repo unless I needed to > and I would not expect it to perform well. > -- > brian m. carlson (he/him or they/them) > Houston, Texas, US >