On Tue, Jun 29 2021, Patrick Steinhardt wrote: > [[PGP Signed Part:Undecided]] > On Mon, Jun 28, 2021 at 09:49:54AM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> On Mon, Jun 28 2021, Patrick Steinhardt wrote: >> >> > [[PGP Signed Part:Undecided]] >> > We'll the connectivity check logic for git-receive-pack(1) in the >> > following commits to make it perform better. As a preparatory step, add >> > some benchmarks such that we can measure these changes. >> > >> > Signed-off-by: Patrick Steinhardt <ps@xxxxxx> >> > --- >> > t/perf/p5400-receive-pack.sh | 97 ++++++++++++++++++++++++++++++++++++ >> > 1 file changed, 97 insertions(+) >> > create mode 100755 t/perf/p5400-receive-pack.sh >> > >> > diff --git a/t/perf/p5400-receive-pack.sh b/t/perf/p5400-receive-pack.sh >> > new file mode 100755 >> > index 0000000000..a945e014a3 >> > --- /dev/null >> > +++ b/t/perf/p5400-receive-pack.sh >> > @@ -0,0 +1,97 @@ >> > +#!/bin/sh >> > + >> > +test_description="Tests performance of receive-pack" >> > + >> > +. ./perf-lib.sh >> > + >> > +test_perf_large_repo >> >> From the runtime I think this just needs test_perf_default_repo, no? >> I.e. we should only have *_large_* for cases where git.git is too small >> to produce meaningful results. >> >> Part of th problem is that git.git has become larger over time... > > I did these tests for 3/3 with git.git first, and results were > significantly different. The performance issues I'm trying to fix with > the connectivity check really only start to show up with largish > repositories. > >> > +test_expect_success 'setup' ' >> > + # Create a main branch such that we do not have to rely on any specific >> > + # branch to exist in the perf repository. >> > + git switch --force-create main && >> > + >> > + # Set up a pre-receive hook such that no refs will ever be changed. >> > + # This easily allows multiple perf runs, but still exercises >> > + # server-side reference negotiation and checking for consistency. >> > + mkdir hooks && >> > + write_script hooks/pre-receive <<-EOF && >> > + #!/bin/sh >> >> You don't need the #!/bin/sh here, and it won't be used. write_script() >> adds it (or the wanted shell path). > > Makes sense. > >> > + echo "failed in pre-receive hook" >> > + exit 1 >> > + EOF >> > + cat >config <<-EOF && >> > + [core] >> > + hooksPath=$(pwd)/hooks >> > + EOF >> >> Easier understood IMO as: >> >> git config -f config core.hooksPath ... > > Yup, will change. > >> > + GIT_CONFIG_GLOBAL="$(pwd)/config" && >> > + export GIT_CONFIG_GLOBAL && >> > + >> > + git switch --create updated && >> > + test_commit --no-tag updated >> > +' >> > + >> > +setup_empty() { >> > + git init --bare "$2" >> > +} >> >> I searched ahead for setup_empty, looked unused, but... >> >> > +setup_clone() { >> > + git clone --bare --no-local --branch main "$1" "$2" >> > +} >> > + >> > +setup_clone_bitmap() { >> > + git clone --bare --no-local --branch main "$1" "$2" && >> > + git -C "$2" repack -Adb >> > +} >> > + >> > +# Create a reference for each commit in the target repository with extra-refs. >> > +# While this may be an atypical setup, biggish repositories easily end up with >> > +# hundreds of thousands of refs, and this is a good enough approximation. >> > +setup_extrarefs() { >> > + git clone --bare --no-local --branch main "$1" "$2" && >> > + git -C "$2" log --all --format="tformat:create refs/commit/%h %H" | >> > + git -C "$2" update-ref --stdin >> > +} >> > + >> > +# Create a reference for each commit in the target repository with extra-refs. >> > +# While this may be an atypical setup, biggish repositories easily end up with >> > +# hundreds of thousands of refs, and this is a good enough approximation. >> > +setup_extrarefs_bitmap() { >> > + git clone --bare --no-local --branch main "$1" "$2" && >> > + git -C "$2" log --all --format="tformat:create refs/commit/%h %H" | >> > + git -C "$2" update-ref --stdin && >> > + git -C "$2" repack -Adb >> > +} >> > + >> > +for repo in empty clone clone_bitmap extrarefs extrarefs_bitmap >> > +do >> > + test_expect_success "$repo setup" ' >> >> > + rm -rf target.git && >> > + setup_$repo "$(pwd)" target.git >> >> ...here we use it via interpolation. >> >> I'd find this whole pattern much easier to understand if the setups were >> just a bunch of test_expect_success that created a repo_empty.git, >> repo_extrarefs.git etc. Then this loop would be: >> >> for repo in repo*.git ... >> >> I'd think that would also give you more meaningful perf data, as now the >> OS will churn between the clone & the subsequent push tests, better to >> do all the setup, then all the different perf tests. >> >> Perhaps there's also a way to re-use this setup across different runs, I >> don't know/can't remember if t/perf has a less transient thing than the >> normal trash directory to use for that. > > I originally had code like this, but the issue with first creating all > the repos is that it requires lots of disk space with large repos given > that we'll clone it once per setup. Combined with the fact that I > often run tests in tmpfs, this led to out-of-memory situations quite > fast given that I had 3x6GB repositories plus the seeded packfiles in > RAM. > > This is why I've changed the setup to do the setup as we go, to bring > disk usage down to something sane. > > Patrick > > [[End of PGP Signed Part]] Ah, I see. In that case wouldn't it be even better/faster with/without my suggestion to not use "clone" here, which would either be manually set up with alternates, or removing the --no-local flag. You'd then share bulk of the object database, and just have different references. B.t.w. you'll probably get less noise/more relevant results if you then do a "pack-refs" after creating those N references. So you should have: 0. Your "big" test repop (not used directly) 1. An empty repo 2. The "big" test repo itself, but just the HEAD branch, using alternates to point to #0 3. Ditto, but we create a crapload of refs for each commit for a version of #2. 4. Ditto #3 (could even "cp" over the packed refs file to save time), but add a bitmap on top. Well, presumably for #4 we'd actually want to do the "git repack -Adb" for #2 (or enforce that #0 must have it), then just move the *.bitmap file(s) to #4. Now the test case conflates whether we have bitmaps with how well (re)packed something is. I think this might also allow you to get rid of the pre-receive hook for a "real" push test, since the side-repos would be so cheap at this point that you could perhaps setup N of them to push into.