On 8/10/2022 6:04 AM, Abhradeep Chakraborty wrote: > On Wed, Aug 10, 2022 at 2:50 PM Johannes Schindelin > <Johannes.Schindelin@xxxxxx> wrote: >> >> Hi Abhradeep, >> I instrumented this, and saw that the `multi-pack-index` and >> `multi-pack-index*.bitmap` files were unchanged by the `git repack` >> invocation. > > Yeah, those two files remain unchanged here. > >> Re-generating the MIDX bitmap forcefully after the repack seems to fix >> things over here: >> >> -- snip -- >> diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh >> index a95537e759b..564124bda27 100644 >> --- a/t/lib-bitmap.sh >> +++ b/t/lib-bitmap.sh >> @@ -438,7 +438,10 @@ midx_bitmap_partial_tests () { >> >> test_expect_success 'setup partial bitmaps' ' >> test_commit packed && >> +ls -l .git/objects/pack/ && >> git repack && >> +git multi-pack-index write --bitmap && >> +ls -l .git/objects/pack/ && >> test_commit loose && >> git multi-pack-index write --bitmap 2>err && >> test_path_is_file $midx && >> -- snap -- >> >> This suggests to me that the `multi-pack-index write --bitmap 2>err` call >> in this hunk might reuse a stale MIDX bitmap, and that _that_ might be >> the root cause of this breakage. > > Yeah, the `multi-pack-index write --bitmap 2>err` is creating the > problem. More specifically the `multi-pack-index write` part. As you > can see in my previous comment (if you get the comment), I shared a > screenshot there which pointed out that the multi-pack-index files in > both cases are different. The portion from which it started to differ > belongs to the `RIDX` chunk. > > So, I used some debug lines in `midx_pack_order()` function[1] and > found that the objects are sorted differently in those cases (i.e. > passing case and failing case). For passing case, the RIDX chunk > contents are like below - > > pack_order = [ 1, 36, 11, 6, 18, 3, 19, 12, 5, 31, 27, 23, 29, 8, 38, > 22, 9, 15, 14, 24, 37, 28, 7, 39, 10, 34, 26, 4, 30, 33, 2, 35, 17, > 32, 0, 21, 16, 25, 13, 40, 20,] > > And in the failing case, this is - > > pack_order = [ 12, 18, 3, 19, 1, 36, 11, 6, 5, 31, 27, 23, 29, 8, 38, > 22, 9, 15, 14, 24, 37, 28, 7, 39, 10, 34, 26, 4, 30, 33, 2, 35, 17, > 32, 0, 21, 16, 25, 13, 40, 20,] > > I went further and realized that this is due to the line[2] - > > if (!e->preferred) > data[i].pack |= (1U << 31); > > I.e. 4- 5 `pack_midx_entry` objects have different `preferred` values > in those cases. For example, > "46193a971f5045cb3ca6022957541f9ccddfbfe78591d8506e2d952f8113059b" > (with pack order 12) is `preferred` in failing case (that's why it is > in the first position) and the same is `not preferred` in the passing > case. > > It may be because of reusing a stale midx bitmap (as you said). But I > am not sure. Just to ensure myself, I compared all the other > packfiles, idx files and a pack `.bitmap` file (which you can see > using ls command) of failing and passing cases and found that they are > the same. You are right that this choice of a 'preferred' pack is part of the root cause for this flake. This choice is not deterministic if the mtime of some pack-files are within the same second. I can make the flake go away with this change: diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh index a95537e759b0..30347285f10f 100644 --- a/t/lib-bitmap.sh +++ b/t/lib-bitmap.sh @@ -438,7 +438,9 @@ midx_bitmap_partial_tests () { test_expect_success 'setup partial bitmaps' ' test_commit packed && + test_tick && git repack && + test_tick && test_commit loose && git multi-pack-index write --bitmap 2>err && test_path_is_file $midx && However, that doesn't help us actually find out what the problem is in our case. I've tried exploring other considerations, resulting in this diff: diff --git a/midx.c b/midx.c index 9c26d04bfded..3b9094d55ae5 100644 --- a/midx.c +++ b/midx.c @@ -921,8 +921,9 @@ static void prepare_midx_packing_data(struct packing_data *pdata, struct pack_midx_entry *from = &ctx->entries[ctx->pack_order[i]]; struct object_entry *to = packlist_alloc(pdata, &from->oid); + /* Why does removing the permutation here not change the outcome? */ oe_set_in_pack(pdata, to, - ctx->info[ctx->pack_perm[from->pack_int_id]].p); + ctx->info[from->pack_int_id].p); } } This method is setting up some important information, supposedly, and in the failing case I see that the ctx->pack_perm performs a 5-cycle ( 0->1, 1->2, 2->3, 3->4, 4->0 ) but this removal does not affect _any existing test cases_! Turns out that the packfile sent here goes through this very trivial path in oe_set_in_pack() every time we are writing a multi-pack-index: static inline void oe_set_in_pack(struct packing_data *pack, struct object_entry *e, struct packed_git *p) { if (pack->in_pack_by_idx) { if (p->index) { e->in_pack_idx = p->index; return; } /* * We're accessing packs by index, but this pack doesn't have * an index (e.g., because it was added since we created the * in_pack_by_idx array). Bail to oe_map_new_pack(), which * will convert us to using the full in_pack array, and then * fall through to our in_pack handling. */ oe_map_new_pack(pack); } pack->in_pack[e - pack->objects] = p; } By debugging, I discovered we are hitting the case that calls oe_map_new_pack(pack). The documentation for that method provides the following (**emphasis mine**): /* * A new pack appears after prepare_in_pack_by_idx() has been * run. **This is likely a race.** * * We could map this new pack to in_pack_by_idx[] array, but then we * have to deal with full array anyway. And since it's hard to test * this fall back code, just stay simple and fall back to using * in_pack[] array. */ void oe_map_new_pack(struct packing_data *pack) The issue being that prepare_packing_data() uses get_all_packs() to get the list of pack-files, but that list is stale for some reason. Adding a reprepare_packed_git() in advance of that call also removes the flake (with always passing): diff --git a/midx.c b/midx.c index 9c26d04bfded..48db91d2728a 100644 --- a/midx.c +++ b/midx.c @@ -915,6 +915,7 @@ static void prepare_midx_packing_data(struct packing_data *pdata, uint32_t i; memset(pdata, 0, sizeof(struct packing_data)); + reprepare_packed_git(the_repository); prepare_packing_data(the_repository, pdata); for (i = 0; i < ctx->entries_nr; i++) { But this still appears like it is just a band-aid over a trickier underlying issue. Hopefully my rambling helps push you in a helpful direction to find a more complete fix. Thanks, -Stolee