Hello, As part of my company's migration from SVN to git, we discovered a performance issue in rev-list with large repositories. The issue appears to be metadata-dependent; we were able to work around it (completely avoiding any performance penalty) by changing the date of certain commits. The general structure of our repository is I think fairly normal (if large -- we have >5.5 million commits total). We have a handful of trunk branches, and ~10k total refs. To reduce the ref count (we hit other performance issues when we had significantly more refs), we remove refs as we're done with them. Any code that doesn't make it into a trunk is preserved in an archive branch. The archive branch has no content, and consists entirely of octopus merges with 50-500 parents. If the archive branch is created with author/commit dates older than the rest of the repository, we're able to run: $ git rev-list --count --all in ~9-10 seconds on a mirror clone with a commit-graph. However, if the archive branch is instead created with author/commit dates newer than the rest of the repository, it takes 4-5 minutes. Using any order other than the default or --reverse removes the disparity. All orders except --author-date-order bring things much closer to the ~9-10 seconds we see with the workaround, and --author-date-order is still under a minute (though not by much). System info from git bugreport: [System Info] git version: git version 2.42.0.windows.2 cpu: x86_64 built from commit: 2f819d1670fff9a1818f63b6722e9959405378e3 sizeof-long: 4 sizeof-size_t: 8 shell-path: /bin/sh feature: fsmonitor--daemon uname: Windows 10.0 19044 compiler info: gnuc: 13.2 libc info: no libc information available $SHELL (typically, interactive shell): C:\Program Files\Git\usr\bin\bash.exe (no enabled hooks) Note that we first realized this was an issue on our GitLab instance, which runs on Linux, so this is not a Windows-specific bug. I created a bash script to create very similar repositories that are/are not affected by the issue; it follows. The issue starts to become visible at 1 million commits (the default), where the difference is ~2x. 5 million commits is roughly equivalent performance-wise to what we saw in our repository, with a difference of ~33x. Note that with 5 million commits, each repository is ~1.2 GB and takes 7-8 minutes to create on an i9-9900 with NVMe storage. Once you create a fast and a slow repo with the script, try the following commands in each one: # Shows the performance difference $ time git rev-list --count –all # Shows very similar performance across both repos $ time git rev-list --count --all --topo-order Thank you, Kevin Lyles ---------------------------------------- #!/bin/bash usage="Usage: $0 <destination folder> <--fast|--slow> [Number of commits (default: 1000000)]" destinationFolder=${1:?$usage} oldTimestamp=315554400 # 1980-01-01 midnight newTimestamp=1672552800 # 2023-01-01 midnight if [ "$2" == "--fast" ] then archiveTimestamp=$oldTimestamp elif [ "$2" == "--slow" ] then archiveTimestamp=$newTimestamp else echo "$usage" >&2 exit 1 fi numberOfCommits=${3:-1000000} if ! [[ "$numberOfCommits" =~ ^[0-9]+$ ]] then echo "$usage" >&2 exit 1 fi increment=$(( (newTimestamp - oldTimestamp) / (numberOfCommits + 2) )) timestamp=$oldTimestamp rm -rf "$destinationFolder" git init "$destinationFolder" echo "Fast-importing repo, please wait..." { echo "feature done" echo "reset refs/heads/main" echo "" for count in $(seq "$numberOfCommits") do timestamp=$(( timestamp + increment )) echo "commit refs/heads/main" echo "mark :$count" echo "committer Test Test <test@xxxxxxxx> $timestamp -0500" echo "data <<|" echo "Main branch commit #$count" echo "|" echo "" done parentMark=0 echo "reset refs/archive" for count in $(seq $(( numberOfCommits / 1000 ))) do echo "commit refs/archive" echo "committer Test Test <test@xxxxxxxx> $archiveTimestamp -0500" echo "data <<|" echo "Archive branch commit #$count" echo "|" for parentCount in {1..50} do parentMark=$(( (parentMark + 99991) % numberOfCommits + 1 )) echo "merge :$parentMark" done echo "" done echo "done" } | git -C "$destinationFolder" fast-import git -C "$destinationFolder" commit-graph write |
<<attachment: smime.p7s>>