Re: [BUG?] Major performance issue with some commands on our repo's master branch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



(resending as text-only after having stupidly replied from my mobile)

I can add a couple things that may or may not be related here. I work
with a large proprietary repo, like you, and it is also not absurdly
large. I maintain some custom tooling for a large scale perforce
interop process.

I used to use "git show" (without patch) in this custom tooling to get
commit metadata, because it has the advantage that you can specify an
arbitrary list of commits in one call, saving some process overheads
in Windows especially.

I stopped using "git show" when user reports of slowness eventually
revealed two things:

1. Large commits (eg merges to feature branches from the fast-moving
main trunk) were taking a surprisingly long time, despite the
no-patch, which made me think it was doing the patch work anyway, and
just discarding it at the end.

2. Merge commits from long-outdated feature branches, even though the
final patch displayed by "git show" is small, also take a long time.
It seems as though whatever patch-related work "git show" does (and
given your observations I guess it might well be rename-detection), it
does it with respect to *both parents* in the case of a merge request,
even though the patch it shows is only changes wrt the first parent.

All this to say: I haven't understood your branch setup, but I'm
guessing that you're regularly integrating work from "far-behind"
branches, and most or all of your commits on master are therefore
merges with large diffs wrt the second parent, and those large diffs
wrt the second parent are what's "getting worse".

I haven't attempted to debug this, and personally have little
incentive to do, as switching to "git log" and accepting the process
overheads solved *my* problem.

If I get the chance to, I will obviously report back here.

Thanks,
Tao

On Sat, Jun 4, 2022 at 10:29 AM Tassilo Horn <tsdh@xxxxxxx> wrote:
>
> Hi all,
>
> [spoiler alert: I've figured out the config option causing the problem
> while writing this long mail, so you might jump straight to the SOLUTION
> section at the bottom of this mail.]
>
> at my day job, I work on a git repo (sadly non-public, proprietary) with
> these stats:
>
> - master has about 150000 commits, the last release branch I've also benchmarked above has 144000 commits
> - the history dates back to 2001
> - .git/ is about 1.8 GB
>
> So it's quite big but not unusually big when compared to linux or other
> free software projects.
>
> The typical git commands I use (status, fetch, pull, commit, push,
> rebase, ...) are all quick.  However, I use the git porcelain Magit [1]
> which invokes several plumbing commands in order to present to the user
> an always up-to-date extended status buffer of the currently checked out
> branch showing the current branch.  Some of those plumbing commands are
> extremely slow for no obvious reasons.  The most outstanding command I
> could pinpoint is this:
>
> --8<---------------cut here---------------start------------->8---
> ❯ time git show --no-patch --format="%h %s" "master^{commit}" --
> 6192a0cfdc6 Merge remote-tracking branch 'origin/SHD_ECORO_3_9_7'
>
> ________________________________________________________
> Executed in   13.21 secs    fish           external
>    usr time   12.99 secs  462.00 micros   12.99 secs
>    sys time    0.17 secs  119.00 micros    0.17 secs
> --8<---------------cut here---------------end--------------->8---
>
> The interesting thing is that I have this problem only with the master
> branch.  When I run it for the last release branch, I get these times:
>
> --8<---------------cut here---------------start------------->8---
> ❯ time git show --no-patch --format="%h %s" "SHD_ECORO_3_9_7^{commit}" --
> 994334fc9fb ECOJ-33833 HTML-Formbrief: Bestellungs-Anhänge im KV-Kontext
>
> ________________________________________________________
> Executed in   22.68 millis    fish           external
>    usr time    7.71 millis  761.00 micros    6.95 millis
>    sys time   10.47 millis  194.00 micros   10.28 millis
> --8<---------------cut here---------------end--------------->8---
>
> So you see, it's almost a factor of 1000 difference!  How can that be?
>
> The split between master and the SHD_ECORO_3_X_X series of branches has
> happened almost 2 years ago and master is way ahead of those.
>
> --8<---------------cut here---------------start------------->8---
> ❯ git log --oneline master...origin/SHD_ECORO_3_9_7 | wc -l
> 5013
> --8<---------------cut here---------------end--------------->8---
>
> But there are around 9 merges from the last release branch into master
> daily.
>
> --8<---------------cut here---------------start------------->8---
> ❯ git log --merges --oneline --since 6months | wc -l
> 1611
> --8<---------------cut here---------------end--------------->8---
>
> From my memory, the issue hasn't popped up out of sudden but has gotten
> worse slowly over time.  I have the impression that the worsening
> increased pace over the last few month which might be the result of our
> workflow.  Before, I've been the merge guy doing two "merge waves" from
> the last supported release branch upwards into master once or twice a
> day (usually release-branch -> next-release-branch -> master).  Since
> about 3 month, we've switched to a workflow where every developer does
> merge upwards herself just after committing/pushing to some lesser
> branch than master simply because branches have diverged so much that
> you'd need to be an expert in everything in order to be able to resolve
> conflicts sensibly.
>
> I should mention that I haven't seen this issue with any other repo I
> have.  But that's also the biggest one I use.  The Emacs repository I
> also work on is comparable in the number of commits but with much less
> merges.
>
> At last, here's the git bugreport sysinfo section on that machine and
> repository.
>
> --8<---------------cut here---------------start------------->8---
> [System Info]
> git version:
> git version 2.36.1
> cpu: x86_64
> no commit associated with this build
> sizeof-long: 8
> sizeof-size_t: 8
> shell-path: /bin/sh
> uname: Linux 5.18.1-zen1-1-zen #1 ZEN SMP PREEMPT_DYNAMIC Mon, 30 May 2022 17:53:16 +0000 x86_64
> compiler info: gnuc: 11.2
> libc info: glibc: 2.35
> $SHELL (typically, interactive shell): /usr/bin/fish
>
> [Enabled Hooks]
> --8<---------------cut here---------------end--------------->8---
>
> SOLUTION
> ========
>
> While writing this long mail, I've figured out that the performance
> penalty is caused by my setting of diff.renameLimit = 10000.  If I
> comment that option in my ~/.gitconfig, the above command finishes in
> 150 millis instead of 13 seconds:
>
> --8<---------------cut here---------------start------------->8---
> ❯ time git show --no-patch --format="%h %s" "master^{commit}" --
> 6192a0cfdc6 Merge remote-tracking branch 'origin/SHD_ECORO_3_9_7'
>
> ________________________________________________________
> Executed in  147.99 millis    fish           external
>    usr time  114.52 millis  713.00 micros  113.81 millis
>    sys time   34.78 millis  193.00 micros   34.59 millis
> --8<---------------cut here---------------end--------------->8---
>
> But there's still the question why diff.renameLimit has an influence
> here when --no-patch is provided so no diff should be generated.
>
> Bye,
> Tassilo
>
> [1] https://magit.vc/




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux