Here is a (tiny) reroll of my series to enable pack reverse-indexes to be written to disk by default instead of computed on-the-fly in memory. The original cover letter[1] and commits herein have all of the gore-y details. Not much has changed since last time, except: - squashing in a patch from Stolee to propagate a `struct repository *` a little further out from `load_pack_revindex()` - a tweak to the linux-TEST-vars CI job to continue running it with GIT_TEST_NO_WRITE_REV_INDEX - and an additional benchmark to demonstrate the performance of `git cat-file --batch-check='%(objectsize:disk)' --batch--all-objects --unordered` with and without on-disk reverse indexes As always, a range-diff is included below for convenience. Thanks in advance for taking (another) look! [1]: https://lore.kernel.org/git/cover.1681166596.git.me@xxxxxxxxxxxx/ Taylor Blau (7): pack-write.c: plug a leak in stage_tmp_packfiles() t5325: mark as leak-free pack-revindex: make `load_pack_revindex` take a repository pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK pack-revindex: introduce `pack.readReverseIndex` config: enable `pack.writeReverseIndex` by default t: invert `GIT_TEST_WRITE_REV_INDEX` Documentation/config/pack.txt | 8 +++++++- builtin/index-pack.c | 5 +++-- builtin/pack-objects.c | 5 +++-- ci/run-build-and-tests.sh | 2 +- pack-bitmap.c | 23 +++++++++++++---------- pack-revindex.c | 12 +++++++++--- pack-revindex.h | 6 ++++-- pack-write.c | 2 ++ packfile.c | 2 +- repo-settings.c | 1 + repository.h | 1 + t/README | 2 +- t/perf/p5312-pack-bitmaps-revs.sh | 3 +-- t/t5325-reverse-index.sh | 16 +++++++++++++++- 14 files changed, 62 insertions(+), 26 deletions(-) Range-diff against v1: -: ----------- > 1: 18be29c3988 pack-write.c: plug a leak in stage_tmp_packfiles() -: ----------- > 2: affb5e2574b t5325: mark as leak-free 1: be4faf11011 ! 3: 687a9a58924 pack-revindex: make `load_pack_revindex` take a repository @@ Commit message to `load_pack_revindex`, and update all callers to pass the correct instance (in all cases, `the_repository`). - Signed-off-by: Taylor Blauy <me@xxxxxxxxxxxx> + In certain instances, a new function-local variable is introduced to + take the place of a `struct repository *` argument to the function + itself to avoid propagating the new parameter even further throughout + the tree. + + Co-authored-by: Derrick Stolee <derrickstolee@xxxxxxxxxx> + Signed-off-by: Derrick Stolee <derrickstolee@xxxxxxxxxx> + Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> ## pack-bitmap.c ## +@@ pack-bitmap.c: static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git + return 0; + } + +-static int load_reverse_index(struct bitmap_index *bitmap_git) ++static int load_reverse_index(struct repository *r, struct bitmap_index *bitmap_git) + { + if (bitmap_is_midx(bitmap_git)) { + uint32_t i; @@ pack-bitmap.c: static int load_reverse_index(struct bitmap_index *bitmap_git) * since we will need to make use of them in pack-objects. */ for (i = 0; i < bitmap_git->midx->num_packs; i++) { - ret = load_pack_revindex(bitmap_git->midx->packs[i]); -+ ret = load_pack_revindex(the_repository, -+ bitmap_git->midx->packs[i]); ++ ret = load_pack_revindex(r, bitmap_git->midx->packs[i]); if (ret) return ret; } return 0; } - return load_pack_revindex(bitmap_git->pack); -+ return load_pack_revindex(the_repository, bitmap_git->pack); ++ return load_pack_revindex(r, bitmap_git->pack); } - static int load_bitmap(struct bitmap_index *bitmap_git) +-static int load_bitmap(struct bitmap_index *bitmap_git) ++static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git) + { + assert(bitmap_git->map); + + bitmap_git->bitmaps = kh_init_oid_map(); + bitmap_git->ext_index.positions = kh_init_oid_pos(); + +- if (load_reverse_index(bitmap_git)) ++ if (load_reverse_index(r, bitmap_git)) + goto failed; + + if (!(bitmap_git->commits = read_bitmap_1(bitmap_git)) || +@@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_git(struct repository *r) + { + struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git)); + +- if (!open_bitmap(r, bitmap_git) && !load_bitmap(bitmap_git)) ++ if (!open_bitmap(r, bitmap_git) && !load_bitmap(r, bitmap_git)) + return bitmap_git; + + free_bitmap_index(bitmap_git); +@@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_git(struct repository *r) + + struct bitmap_index *prepare_midx_bitmap_git(struct multi_pack_index *midx) + { ++ struct repository *r = the_repository; + struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git)); + +- if (!open_midx_bitmap_1(bitmap_git, midx) && !load_bitmap(bitmap_git)) ++ if (!open_midx_bitmap_1(bitmap_git, midx) && !load_bitmap(r, bitmap_git)) + return bitmap_git; + + free_bitmap_index(bitmap_git); +@@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs, + * from disk. this is the point of no return; after this the rev_list + * becomes invalidated and we must perform the revwalk through bitmaps + */ +- if (load_bitmap(bitmap_git) < 0) ++ if (load_bitmap(revs->repo, bitmap_git) < 0) + goto cleanup; + + object_array_clear(&revs->pending); +@@ pack-bitmap.c: int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, + uint32_t *entries, + struct bitmap **reuse_out) + { ++ struct repository *r = the_repository; + struct packed_git *pack; + struct bitmap *result = bitmap_git->result; + struct bitmap *reuse; +@@ pack-bitmap.c: int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, + + assert(result); + +- load_reverse_index(bitmap_git); ++ load_reverse_index(r, bitmap_git); + + if (bitmap_is_midx(bitmap_git)) + pack = bitmap_git->midx->packs[midx_preferred_pack(bitmap_git)]; +@@ pack-bitmap.c: int rebuild_bitmap(const uint32_t *reposition, + uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git, + struct packing_data *mapping) + { ++ struct repository *r = the_repository; + uint32_t i, num_objects; + uint32_t *reposition; + + if (!bitmap_is_midx(bitmap_git)) +- load_reverse_index(bitmap_git); ++ load_reverse_index(r, bitmap_git); + else if (load_midx_revindex(bitmap_git->midx) < 0) + BUG("rebuild_existing_bitmaps: missing required rev-cache " + "extension"); ## pack-revindex.c ## @@ pack-revindex.c: static int load_pack_revindex_from_disk(struct packed_git *p) 2: 0f368e2347e = 4: 8eec5bacd3a pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK 3: 6f692d470cb ! 5: a62fc3e4ec1 pack-revindex: introduce `pack.readReverseIndex` @@ Commit message 'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" --batch-all-objects' ran 2.06 ± 0.02 times faster than 'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" --batch-all-objects' + Luckily, the results when running `git cat-file` with `--unordered` are + closer together: + + $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects' + Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects + Time (mean ± σ): 5.066 s ± 0.105 s [User: 4.792 s, System: 0.274 s] + Range (min … max): 4.943 s … 5.220 s 10 runs + + Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects + Time (mean ± σ): 6.193 s ± 0.069 s [User: 5.937 s, System: 0.255 s] + Range (min … max): 6.145 s … 6.356 s 10 runs + + Summary + 'git.compile -c pack.readReverseIndex=false cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects' ran + 1.22 ± 0.03 times faster than 'git.compile -c pack.readReverseIndex=true cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects' + Because the equilibrium point between these two is highly machine- and repository-dependent, allow users to configure whether or not they will read any ".rev" file(s) with this configuration knob. 4: 56a0fc0098e = 6: f8298fb0bac config: enable `pack.writeReverseIndex` by default 5: 9c803799588 ! 7: edff6a80c63 t: invert `GIT_TEST_WRITE_REV_INDEX` @@ Commit message disable writing ".rev" files, thereby running the test suite in a mode where the reverse index is generated from scratch. - This ensures that we are still running and exercising Git's behavior - when forced to generate reverse indexes from scratch. + This ensures that, when GIT_TEST_NO_WRITE_REV_INDEX is set to some + spelling of "true", we are still running and exercising Git's behavior + when forced to generate reverse indexes from scratch. Do so by setting + it in the linux-TEST-vars CI run to ensure that we are maintaining good + coverage of this now-legacy code. Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> @@ ci/run-build-and-tests.sh: linux-TEST-vars) export GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=1 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master - export GIT_TEST_WRITE_REV_INDEX=1 ++ export GIT_TEST_NO_WRITE_REV_INDEX=1 export GIT_TEST_CHECKOUT_WORKERS=2 ;; linux-clang) -- 2.40.0.323.gedff6a80c63