This series fixes a couple of brown paper bag bugs in the MIDX/bitmap code. This version is basically identical to the previous round, except that we use the sentinel value "-1" as the 'pack_int_id' when reusing from a single (non-MIDX'd) pack. This is a "garbage in, garbage out" measure to ensure that we notice any regressions here loudly. Thanks in advance for your review! Taylor Blau (3): midx-write.c: do not read existing MIDX with `packs_to_include` pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse pack-revindex.c: guard against out-of-bounds pack lookups midx-write.c | 42 ++++++++++++++++++++++++++--------- pack-bitmap.c | 13 +++++++++++ pack-revindex.c | 3 +++ t/t5326-multi-pack-bitmaps.sh | 30 +++++++++++++++++++++++++ t/t5332-multi-pack-reuse.sh | 26 ++++++++++++++++++++++ 5 files changed, 103 insertions(+), 11 deletions(-) Range-diff against v1: 1: 4aceb9233e ! 1: 0bdf925366 pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse @@ Metadata Author: Taylor Blau <me@xxxxxxxxxxxx> ## Commit message ## - pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse - - When performing multi-pack reuse, reuse_partial_packfile_from_bitmap() - is responsible for generating an array of bitmapped_pack structs from - which to perform reuse. - - In the multi-pack case, we loop over the MIDXs packs and copy the result - of calling `nth_bitmapped_pack()` to construct the list of reusable - paths. - - But we may also want to do pack-reuse over a single pack, either because - we only had one pack to perform reuse over (in the case of single-pack - bitmaps), or because we explicitly asked to do single pack reuse even - with a MIDX[^1]. - - When this is the case, the array we generate of reusable packs contains - only a single element, which is either (a) the pack attached to the - single-pack bitmap, or (b) the MIDX's preferred pack. - - In 795006fff4 (pack-bitmap: gracefully handle missing BTMP chunks, - 2024-04-15), we refactored the reuse_partial_packfile_from_bitmap() - function and stopped assigning the pack_int_id field when reusing only - the MIDX's preferred pack. This results in an uninitialized read down in - try_partial_reuse() like so: - - ==7474==WARNING: MemorySanitizer: use-of-uninitialized-value - #0 0x55c5cd191dde in try_partial_reuse pack-bitmap.c:1887:8 - #1 0x55c5cd191dde in reuse_partial_packfile_from_bitmap_1 pack-bitmap.c:2001:8 - #2 0x55c5cd191dde in reuse_partial_packfile_from_bitmap pack-bitmap.c:2105:3 - #3 0x55c5cce0bd0e in get_object_list_from_bitmap builtin/pack-objects.c:4043:3 - #4 0x55c5cce0bd0e in get_object_list builtin/pack-objects.c:4156:27 - #5 0x55c5cce0bd0e in cmd_pack_objects builtin/pack-objects.c:4596:3 - #6 0x55c5ccc8fac8 in run_builtin git.c:474:11 - - which happens when try_partial_reuse() tries to call - midx_pair_to_pack_pos() when it tries to reject cross-pack deltas. - - Avoid the uninitialized read by ensuring that the pack_int_id field is - set in the single-pack reuse case by setting it to either the MIDX - preferred pack's pack_int_id, or '0', in the case of single-pack - bitmaps. In the latter case, we never read the pack_int_id field, so - the choice of '0' is arbitrary. - - [^1]: This can happen for a couple of reasons, either because the - repository is configured with 'pack.allowPackReuse=(true|single)', or - because the MIDX was generated prior to the introduction of the BTMP - chunk, which contains information necessary to perform multi-pack - reuse. - - Reported-by: Kyle Lippincott <spectral@xxxxxxxxxx> + midx-write.c: do not read existing MIDX with `packs_to_include` + + Commit d6a8c58675 (midx-write.c: support reading an existing MIDX with + `packs_to_include`, 2024-05-29) changed the MIDX generation machinery to + support reading from an existing MIDX when writing a new one. + + Unfortunately, the rest of the MIDX generation machinery is not prepared + to deal with such a change. For instance, the function responsible for + adding to the object ID fanout table from a MIDX source + (midx_fanout_add_midx_fanout()) will gladly add objects from an existing + MIDX for some fanout level regardless of whether or not those objects + came from packs that are to be included in the subsequent MIDX write. + + This results in broken pseudo-pack object order (leading to incorrect + object traversal results) and segmentation faults, like so (generated by + running the added test prior to the changes in midx-write.c): + + #0 0x000055ee31393f47 in midx_pack_order (ctx=0x7ffdde205c70) at midx-write.c:590 + #1 0x000055ee31395a69 in write_midx_internal (object_dir=0x55ee32570440 ".git/objects", + packs_to_include=0x7ffdde205e20, packs_to_drop=0x0, preferred_pack_name=0x0, + refs_snapshot=0x0, flags=15) at midx-write.c:1171 + #2 0x000055ee31395f38 in write_midx_file_only (object_dir=0x55ee32570440 ".git/objects", + packs_to_include=0x7ffdde205e20, preferred_pack_name=0x0, refs_snapshot=0x0, flags=15) + at midx-write.c:1274 + [...] + + In stack frame #0, the code on midx-write.c:590 is using the new pack ID + corresponding to some object which was added from the existing MIDX. + Importantly, the pack from which that object was selected in the + existing MIDX does not appear in the new MIDX as it was excluded via + `--stdin-packs`. + + In this instance, the pack in question had pack ID "1" in the existing + MIDX, but since it was excluded from the new MIDX, we never filled in + that entry in the pack_perm table, resulting in: + + (gdb) p *ctx->pack_perm@2 + $1 = {0, 1515870810} + + Which is what causes the segfault above when we try and read: + + struct pack_info *pack = &ctx->info[ctx->pack_perm[i]]; + if (pack->bitmap_pos == BITMAP_POS_UNKNOWN) + pack->bitmap_pos = 0; + + Fundamentally, we should be able to read information from an existing + MIDX when generating a new one. But in practice the midx-write.c code + assumes that we won't run into issues like the above with incongruent + pack IDs, and often makes those assumptions in extremely subtle and + fragile ways. + + Instead, let's avoid reading from an existing MIDX altogether, and stick + with the pre-d6a8c58675 implementation. Harden against any regressions + in this area by adding a test which demonstrates these issues. + Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> - ## pack-bitmap.c ## -@@ pack-bitmap.c: void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, - QSORT(packs, packs_nr, bitmapped_pack_cmp); - } else { - struct packed_git *pack; -+ uint32_t pack_int_id; - - if (bitmap_is_midx(bitmap_git)) { - uint32_t preferred_pack_pos; -@@ pack-bitmap.c: void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, - } - - pack = bitmap_git->midx->packs[preferred_pack_pos]; -+ pack_int_id = preferred_pack_pos; - } else { - pack = bitmap_git->pack; -+ /* -+ * Any value for 'pack_int_id' will do here. When we -+ * process the pack via try_partial_reuse(), we won't -+ * use the `pack_int_id` field since we have a non-MIDX -+ * bitmap. -+ */ -+ pack_int_id = 0; - } - - ALLOC_GROW(packs, packs_nr + 1, packs_alloc); - packs[packs_nr].p = pack; -+ packs[packs_nr].pack_int_id = pack_int_id; - packs[packs_nr].bitmap_nr = pack->num_objects; - packs[packs_nr].bitmap_pos = 0; + ## midx-write.c ## +@@ midx-write.c: struct write_midx_context { + }; + static int should_include_pack(const struct write_midx_context *ctx, +- const char *file_name, +- int exclude_from_midx) ++ const char *file_name) + { +- if (exclude_from_midx && ctx->m && midx_contains_pack(ctx->m, file_name)) ++ /* ++ * Note that at most one of ctx->m and ctx->to_include are set, ++ * so we are testing midx_contains_pack() and ++ * string_list_has_string() independently (guarded by the ++ * appropriate NULL checks). ++ * ++ * We could support passing to_include while reusing an existing ++ * MIDX, but don't currently since the reuse process drags ++ * forward all packs from an existing MIDX (without checking ++ * whether or not they appear in the to_include list). ++ * ++ * If we added support for that, these next two conditional ++ * should be performed independently (likely checking ++ * to_include before the existing MIDX). ++ */ ++ if (ctx->m && midx_contains_pack(ctx->m, file_name)) + return 0; +- if (ctx->to_include && !string_list_has_string(ctx->to_include, +- file_name)) ++ else if (ctx->to_include && ++ !string_list_has_string(ctx->to_include, file_name)) + return 0; + return 1; + } +@@ midx-write.c: static void add_pack_to_midx(const char *full_path, size_t full_path_len, + if (ends_with(file_name, ".idx")) { + display_progress(ctx->progress, ++ctx->pack_paths_checked); + +- if (!should_include_pack(ctx, file_name, 1)) ++ if (!should_include_pack(ctx, file_name)) + return; + + ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc); +@@ midx-write.c: static int fill_packs_from_midx(struct write_midx_context *ctx, + uint32_t i; + + for (i = 0; i < ctx->m->num_packs; i++) { +- if (!should_include_pack(ctx, ctx->m->pack_names[i], 0)) +- continue; +- + ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc); + + if (flags & MIDX_WRITE_REV_INDEX || preferred_pack_name) { +@@ midx-write.c: static int write_midx_internal(const char *object_dir, + die_errno(_("unable to create leading directories of %s"), + midx_name.buf); + +- ctx.m = lookup_multi_pack_index(the_repository, object_dir); ++ if (!packs_to_include) { ++ /* ++ * Only reference an existing MIDX when not filtering which ++ * packs to include, since all packs and objects are copied ++ * blindly from an existing MIDX if one is present. ++ */ ++ ctx.m = lookup_multi_pack_index(the_repository, object_dir); ++ } ++ + if (ctx.m && !midx_checksum_valid(ctx.m)) { + warning(_("ignoring existing multi-pack-index; checksum mismatch")); + ctx.m = NULL; +@@ midx-write.c: static int write_midx_internal(const char *object_dir, + ctx.nr = 0; + ctx.alloc = ctx.m ? ctx.m->num_packs : 16; + ctx.info = NULL; +- ctx.to_include = packs_to_include; + ALLOC_ARRAY(ctx.info, ctx.alloc); + + if (ctx.m && fill_packs_from_midx(&ctx, preferred_pack_name, +@@ midx-write.c: static int write_midx_internal(const char *object_dir, + else + ctx.progress = NULL; + ++ ctx.to_include = packs_to_include; ++ + for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx); + stop_progress(&ctx.progress); + + + ## t/t5326-multi-pack-bitmaps.sh ## +@@ t/t5326-multi-pack-bitmaps.sh: do + ' + done + ++test_expect_success 'remove one packfile between MIDX bitmap writes' ' ++ git init remove-pack-between-writes && ++ ( ++ cd remove-pack-between-writes && ++ ++ test_commit A && ++ test_commit B && ++ test_commit C && ++ ++ # Create packs with the prefix "pack-A", "pack-B", ++ # "pack-C" to impose a lexicographic order on these ++ # packs so the pack being removed is always from the ++ # middle. ++ packdir=.git/objects/pack && ++ A="$(echo A | git pack-objects $packdir/pack-A --revs)" && ++ B="$(echo B | git pack-objects $packdir/pack-B --revs)" && ++ C="$(echo C | git pack-objects $packdir/pack-C --revs)" && ++ ++ git multi-pack-index write --bitmap && ++ ++ cat >in <<-EOF && ++ pack-A-$A.idx ++ pack-C-$C.idx ++ EOF ++ git multi-pack-index write --bitmap --stdin-packs <in && ++ ++ git rev-list --test-bitmap HEAD ++ ) ++' ++ + test_done -: ---------- > 2: 4b006f44a5 pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse -: ---------- > 3: 06de4005f1 pack-revindex.c: guard against out-of-bounds pack lookups base-commit: 1b76f065085811104b5f4adff001956d7e5c5d1c -- 2.45.2.448.g06de4005f1