RE: [Internet]Re: reachability-bitmap makes push performance worse ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> This is a known issue, I think you've found the same problem discussed in these past threads:
> 
> https://lore.kernel.org/git/38b99459158a45b1bea09037f3dd092d@xxxxxxxxxxxxxxxxxxxxxxxx/
> https://lore.kernel.org/git/87zhoz8b9o.fsf@xxxxxxxxxxxxxxxxxxx/

Thanks.

> The latter one in particular has a lot of extra details. The former also has the suggestion of a per-push bitmap configuration as a workaround.
>
> As your numbers show it's still an issue today, but those threads should help you if you're looking to dig further into the root cause.
> 
> Aside from the underlying root causes it would be very nice to fix the progress code in that area, i.e. we "stall" on "Enumerating objects", which is just a matter of us not having a separate progress bar for the very expensive bitmap work we're doing.

It looks like optimizing the bitmap to solve the problem will be a long process. This requires developers to have a deep understanding of the algorithm.

A per-push bitmap configuration as a workaround can't completely solve the problem, but it works for me. 
After all, bitmap was not designed to optimize git push. Most of time, git push is not been called as frequently as git fetch.

The problem has been around for 3 years, has the community considered providing a config like "push.useBitmap" to prevent git push using bitmap?
It would be appreciated if there is such a config, which can quickly solve my problem and doesn't seem like a lot of work.

If no one is interested in it, I can also try to submit a patch (although it may be a bit slow since all I am new to the git community).


-----Original Message-----
From: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> 
Sent: 2022年6月14日 16:56
To: kylezhao(赵柯宇) <kylezhao@xxxxxxxxxxx>
Cc: git@xxxxxxxxxxxxxxx
Subject: [Internet]Re: reachability-bitmap makes push performance worse ?


On Tue, Jun 14 2022, kylezhao(赵柯宇) wrote:

> Hi All,
>  
> thank you for reading my report.
>  
>  
> How did we find out?
>  
> The problem described in the title occurs on our git server.
> Each git repositories have multiple replicas on our servers to increase git read performance, and the data synchronization method between these replicas is git push.
> One day we found that the git push of a repository was significantly slow, and it took more than ten seconds to just create a new branch from an existing commit.
>  
> How to reproduce the problem ?
>  
> git version: 2.36.1
>  
> # /data/test/repo is a bare git repository which can reproduce the 
> problem $ cd /data/test/repo
>  
> # number of refs
> $ git show-ref | wc -l
> 21134
> # pack information
> $ ls objects/pack/ -hl
> total 14G
> -r--r--r-- 1 root root  43M Jun 14 04:16 
> pack-9a7fc024652645a632fb82a4ff26c3ddf4883eed.bitmap
> -r--r--r-- 1 root root 169M Jun 14 04:15 
> pack-9a7fc024652645a632fb82a4ff26c3ddf4883eed.idx
> -r--r--r-- 1 root root  14G Jun 14 04:14 
> pack-9a7fc024652645a632fb82a4ff26c3ddf4883eed.pack
>  
> # objects information
> $ git count-objects -v
> count: 0
> size: 0
> in-pack: 5185141
> packs: 1
> size-pack: 13938704
> prune-packable: 0
> garbage: 0
> size-garbage: 0
>  
> # number of commits
> $ git rev-list --all |  wc -l
> 955262
>  
> $ cp -r /data/test/repo /data/test/replica-1 $ cp -r /data/test/repo 
> /data/test/replica-2 $ cd /data/test/replica-1
>  
> # create a branch from an existing commit $ git update-ref 
> refs/heads/b_1 43fa4721c61106583cd552da85da3bd84f0f9929
> $ git show-ref | grep 43fa4721c61106583cd552da85da3bd84f0f9929
> 43fa4721c61106583cd552da85da3bd84f0f9929 refs/heads/b_1
>  
> # number of commits of the ref
> $ git rev-list refs/heads/b_1 |  wc -l
> 117836
>  
> # git push with bitmap
> $ GIT_TRACE=1 git push file:///data/test/replica-2 refs/heads/b_1
> 04:19:07.654103 git.c:459               trace: built-in: git push 
> file:///data/test/replica-2 refs/heads/b_1
> 04:19:07.690006 run-command.c:654       trace: run_command: unset GIT_DIR GIT_IMPLICIT_WORK_TREE GIT_PREFIX; 'git-receive-pack '\''/data/test/replica-2'\'''
> 04:19:07.694339 git.c:459               trace: built-in: git 
> receive-pack /data/test/replica-2
> 04:19:07.751814 run-command.c:654       trace: run_command: git 
> pack-objects --all-progress-implied --revs --stdout --thin 
> --delta-base-offset --progress
> 04:19:07.754011 git.c:459               trace: built-in: git 
> pack-objects --all-progress-implied --revs --stdout --thin 
> --delta-base-offset --progress Total 0 (delta 0), reused 0 (delta 0), 
> pack-reused 0
> 04:19:20.304868 run-command.c:654       trace: run_command:
> GIT_ALTERNATE_OBJECT_DIRECTORIES=/data/test/replica-2/./objects
> GIT_OBJECT_DIRECTORY=/data/test/replica-2/./objects/tmp_objdir-incomin
> g-CaCTHm
> GIT_QUARANTINE_PATH
> =/data/test/replica-2/./objects/tmp_objdir-incoming-CaCTHm git 
> unpack-objects --pack_header=2,0
> remote: 04:19:20.306550 git.c:459               trace: built-in: git 
> unpack-objects --pack_header=2,0
> 04:19:20.306903 run-command.c:654       trace: run_command:
> GIT_ALTERNATE_OBJECT_DIRECTORIES=/data/test/replica-2/./objects
> GIT_OBJECT_DIRECTORY=/data/test/replica-2/./objects/tmp_objdir-incomin
> g-CaCTHm
> GIT_QUARANTINE_PATH
> =/data/test/replica-2/./objects/tmp_objdir-incoming-CaCTHm git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
> remote: 04:19:20.308332 git.c:459               trace: built-in: git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
> remote: 04:19:20.344031 run-command.c:654       trace: run_command:
> unset GIT_ALTERNATE_OBJECT_DIRECTORIES GIT_DIR GIT_OBJECT_DIRECTORY 
> GIT_PREFIX; git --git-dir=/data/test/replica-2 for-each-ref 
> '--format=%(objectname)'
> remote: 04:19:20.346359 git.c:459               trace: built-in: git for-each-ref '--format=%(objectname)'
> 04:19:20.395511 run-command.c:654       trace: run_command: git gc 
> --auto --quiet
> remote: 04:19:20.397949 git.c:459               trace: built-in: git 
> gc --auto --quiet To file:///data/test/replica-2
> * [new branch]                b_1 -> b_1
>  
> # reset replica-2 and remove bitmap
> $ rm -rf /data/test/replica-2
> $ cp -r /data/test/repo /data/test/replica-2 $ rm 
> objects/pack/pack-9a7fc024652645a632fb82a4ff26c3ddf4883eed.bitmap
>  
>  
> # git push without bitmap
> $ GIT_TRACE=1 git push file:///data/test/replica-2 refs/heads/b_1
> 04:20:44.633590 git.c:459               trace: built-in: git push 
> file:///data/test/replica-2 refs/heads/b_1
> 04:20:44.668908 run-command.c:654       trace: run_command: unset GIT_DIR GIT_IMPLICIT_WORK_TREE GIT_PREFIX; 'git-receive-pack '\''/data/test/replica-2'\'''
> 04:20:44.673234 git.c:459               trace: built-in: git 
> receive-pack /data/test/replica-2
> 04:20:44.720852 run-command.c:654       trace: run_command: git 
> pack-objects --all-progress-implied --revs --stdout --thin 
> --delta-base-offset --progress
> 04:20:44.723100 git.c:459               trace: built-in: git 
> pack-objects --all-progress-implied --revs --stdout --thin 
> --delta-base-offset --progress Total 0 (delta 0), reused 0 (delta 0), 
> pack-reused 0
> 04:20:44.800298 run-command.c:654       trace: run_command:
> GIT_ALTERNATE_OBJECT_DIRECTORIES=/data/test/replica-2/./objects
> GIT_OBJECT_DIRECTORY=/data/test/replica-2/./objects/tmp_objdir-incomin
> g-UOWY1E
> GIT_QUARANTINE_PATH
> =/data/test/replica-2/./objects/tmp_objdir-incoming-UOWY1E git 
> unpack-objects --pack_header=2,0
> remote: 04:20:44.802056 git.c:459               trace: built-in: git 
> unpack-objects --pack_header=2,0
> 04:20:44.802474 run-command.c:654       trace: run_command:
> GIT_ALTERNATE_OBJECT_DIRECTORIES=/data/test/replica-2/./objects
> GIT_OBJECT_DIRECTORY=/data/test/replica-2/./objects/tmp_objdir-incomin
> g-UOWY1E
> GIT_QUARANTINE_PATH
> =/data/test/replica-2/./objects/tmp_objdir-incoming-UOWY1E git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
> remote: 04:20:44.803930 git.c:459               trace: built-in: git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
> remote: 04:20:44.834388 run-command.c:654       trace: run_command:
> unset GIT_ALTERNATE_OBJECT_DIRECTORIES GIT_DIR GIT_OBJECT_DIRECTORY 
> GIT_PREFIX; git --git-dir=/data/test/replica-2 for-each-ref 
> '--format=%(objectname)'
> remote: 04:20:44.836220 git.c:459               trace: built-in: git for-each-ref '--format=%(objectname)'
> 04:20:44.884165 run-command.c:654       trace: run_command: git gc 
> --auto --quiet
> remote: 04:20:44.886108 git.c:459               trace: built-in: git 
> gc --auto --quiet To file:///data/test/replica-2
> * [new branch]                b_1 -> b_1
>  
>  
> It can be seen from the above operations that git push is stuck in the git pack-objects process for about 13s for a long time.
> After I deleted the bitmap, the whole git push completed in less than 1s.
>  
> During testing, we found that not every git repository was significantly affected by bitmap. 
> This may be related to the number of objects in the git repository itself, the number of refs, and the sha1 pointed to by the pushed branch.
>  
> We benefit from bitmap performance optimizations for git fetch and clone, but it seems that it affects the performance of git push.
>  
> Maybe we can disable bitmap under the process of git push?
> As far as I know, the number of "counting objects" represented during a git push is usually small relative to the entire repository.
> Counting objects by building bitmaps in memory may take more time than before.
>  
> Of course, it would be better if anyone has a better solution.

This is a known issue, I think you've found the same problem discussed in these past threads:

https://lore.kernel.org/git/38b99459158a45b1bea09037f3dd092d@xxxxxxxxxxxxxxxxxxxxxxxx/
https://lore.kernel.org/git/87zhoz8b9o.fsf@xxxxxxxxxxxxxxxxxxx/

The latter one in particular has a lot of extra details. The former also has the suggestion of a per-push bitmap configuration as a workaround.

As your numbers show it's still an issue today, but those threads should help you if you're looking to dig further into the root cause.

Aside from the underlying root causes it would be very nice to fix the progress code in that area, i.e. we "stall" on "Enumerating objects", which is just a matter of us not having a separate progress bar for the very expensive bitmap work we're doing.





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux