Re: Performance of "git gc..." is extremely bad in some cases

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 8, 2021 at 1:32 PM Anthony Muller <anthony@xxxxxxxxxxxx> wrote:
>
> What did you do before the bug happened? (Steps to reproduce your issue)
>
> git clone https://github.com/notracking/hosts-blocklists
> cd hosts-blocklists
> git reflog expire --all --expire=now && git gc --prune=now --aggressive

--aggressive tells git gc to discard all of its existing delta chains
and go find new ones, and to be fairly aggressive in how it looks for
candidates. This is going to be the primary source of the resource
usage you see, as well as the time.

Aggressive GCs are something you do once in a (very great) while. If
you try this without the --aggressive, how does it look?

>
>
> What did you expect to happen? (Expected behavior)
>
> Running gc on a ~300 MB repo should not take 1 hour 55 minutes when
> running gc on a 2.6 GB repo (LLVM) only takes 24 minutes.
>
>
> What happened instead? (Actual behavior)
>
> Command took 1h 55m to complete on a ~300MB repo and used enough
> resources that the machine is almost unusable.
>
>
> What's different between what you expected and what actually happened?
>
> Compression stage uses the majority of the resources and time. Compression
> itself, when compared to something like zlib or lzma, should not take very long.
> While more may be happening as objects are compressed, the amount of time
> gc takes to compress the objects and the resources it consumed are both
> unreasonable.

The compression happening here is delta compression, not simple
compression like zip. Git searches across the repository for similar
objects and stores them as chains with a base object and (essentially)
instructions for converting that base object into another object.
That's significantly more resource-intensive work than zipping some
data.

>
> Memory: RSS = 3451152 KB (3.29 GB), VSZ = 29286272 KB (27.92 GB)
> Time: 12902.83s user 8995.41s system 315% cpu 1:55:36.73 total

Git offers several knobs that can be used to influence (though not
necessarily control) its resource usage. On 64-bit Linux the defaults
are 1 thread per logical CPU (so hyperthreaded CPUs use double) and
_unlimited_ memory usage per thread. You might want to investigate
some options like pack.threads and pack.windowmemory to apply some
constraints.

>
> I've seen this issue with a number of repos and size of the repo does not
> determine if this happens. LLVM @ 2.6 GB worked flawlessly, a 900 MB
> repo never finished, this 300 MB repo takes forever, and if you test something
> like chromium git will just crash.
>
>
> [System Info]
> hardware: 2.9Ghz Quad Core i7
> git version:
> git version 2.30.0
> cpu: x86_64
> no commit associated with this build
> sizeof-long: 8
> sizeof-size_t: 8
> shell-path: /bin/sh
> uname: Darwin 19.6.0 Darwin Kernel Version 19.6.0: Tue Jan 12 22:13:05 PST 2021; root:xnu-6153.141.16~1/RELEASE_X86_64 x86_64
> compiler info: clang: 12.0.0 (clang-1200.0.32.28)
> libc info: no libc information available
> $SHELL (typically, interactive shell): /usr/local/bin/zsh
>

Hope this helps!
-b



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux