Re: [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/10/2022 6:04 AM, Abhradeep Chakraborty wrote:
> On Wed, Aug 10, 2022 at 2:50 PM Johannes Schindelin
> <Johannes.Schindelin@xxxxxx> wrote:
>>
>> Hi Abhradeep,
>> I instrumented this, and saw that the `multi-pack-index` and
>> `multi-pack-index*.bitmap` files were unchanged by the `git repack`
>> invocation.
> 
> Yeah, those two files remain unchanged here.
> 
>> Re-generating the MIDX bitmap forcefully after the repack seems to fix
>> things over here:
>>
>> -- snip --
>> diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
>> index a95537e759b..564124bda27 100644
>> --- a/t/lib-bitmap.sh
>> +++ b/t/lib-bitmap.sh
>> @@ -438,7 +438,10 @@ midx_bitmap_partial_tests () {
>>
>>         test_expect_success 'setup partial bitmaps' '
>>                 test_commit packed &&
>> +ls -l .git/objects/pack/ &&
>>                 git repack &&
>> +git multi-pack-index write --bitmap &&
>> +ls -l .git/objects/pack/ &&
>>                 test_commit loose &&
>>                 git multi-pack-index write --bitmap 2>err &&
>>                 test_path_is_file $midx &&
>> -- snap --
>>
>> This suggests to me that the `multi-pack-index write --bitmap 2>err` call
>> in this hunk might reuse a stale MIDX bitmap, and that _that_  might be
>> the root cause of this breakage.
> 
> Yeah, the `multi-pack-index write --bitmap 2>err` is creating the
> problem. More specifically the `multi-pack-index write` part. As you
> can see in my previous  comment (if you get the comment), I shared a
> screenshot there which pointed out that the multi-pack-index files in
> both cases are different. The portion from which it started to differ
> belongs to the `RIDX` chunk.
> 
> So, I used some debug lines in `midx_pack_order()` function[1] and
> found that the objects are sorted differently in those cases (i.e.
> passing case and failing case). For passing case, the RIDX chunk
> contents are like below -
> 
> pack_order = [ 1, 36, 11, 6, 18, 3, 19, 12, 5, 31, 27, 23, 29, 8, 38,
> 22, 9, 15, 14, 24, 37, 28, 7, 39, 10, 34, 26, 4, 30, 33, 2, 35, 17,
> 32, 0, 21, 16, 25, 13, 40, 20,]
> 
> And in the failing case, this is -
> 
> pack_order = [ 12, 18, 3, 19, 1, 36, 11, 6, 5, 31, 27, 23, 29, 8, 38,
> 22, 9, 15, 14, 24, 37, 28, 7, 39, 10, 34, 26, 4, 30, 33, 2, 35, 17,
> 32, 0, 21, 16, 25, 13, 40, 20,]
> 
> I went further and realized that this is due to the line[2] -
> 
>     if (!e->preferred)
>         data[i].pack |= (1U << 31);
> 
> I.e. 4- 5 `pack_midx_entry` objects have different `preferred` values
> in those cases. For example,
> "46193a971f5045cb3ca6022957541f9ccddfbfe78591d8506e2d952f8113059b"
> (with pack order 12) is `preferred` in failing case (that's why it is
> in the first position) and the same is `not preferred` in the passing
> case.
> 
> It may be because of reusing a stale midx bitmap (as you said). But I
> am not sure. Just to ensure myself, I compared all the other
> packfiles, idx files and a pack `.bitmap` file (which you can see
> using ls command) of failing and passing cases and found that they are
> the same.

You are right that this choice of a 'preferred' pack is part of the
root cause for this flake. This choice is not deterministic if the
mtime of some pack-files are within the same second.

I can make the flake go away with this change:

diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
index a95537e759b0..30347285f10f 100644
--- a/t/lib-bitmap.sh
+++ b/t/lib-bitmap.sh
@@ -438,7 +438,9 @@ midx_bitmap_partial_tests () {
 
 	test_expect_success 'setup partial bitmaps' '
 		test_commit packed &&
+		test_tick &&
 		git repack &&
+		test_tick &&
 		test_commit loose &&
 		git multi-pack-index write --bitmap 2>err &&
 		test_path_is_file $midx &&


However, that doesn't help us actually find out what the problem is
in our case.

I've tried exploring other considerations, resulting in this diff:

diff --git a/midx.c b/midx.c
index 9c26d04bfded..3b9094d55ae5 100644
--- a/midx.c
+++ b/midx.c
@@ -921,8 +921,9 @@ static void prepare_midx_packing_data(struct packing_data *pdata,
 		struct pack_midx_entry *from = &ctx->entries[ctx->pack_order[i]];
 		struct object_entry *to = packlist_alloc(pdata, &from->oid);
 
+		/* Why does removing the permutation here not change the outcome? */
 		oe_set_in_pack(pdata, to,
-			       ctx->info[ctx->pack_perm[from->pack_int_id]].p);
+			       ctx->info[from->pack_int_id].p);
 	}
 }
 
This method is setting up some important information, supposedly, and
in the failing case I see that the ctx->pack_perm performs a 5-cycle
( 0->1, 1->2, 2->3, 3->4, 4->0 ) but this removal does not affect _any
existing test cases_!

Turns out that the packfile sent here goes through this very trivial
path in oe_set_in_pack() every time we are writing a multi-pack-index:

static inline void oe_set_in_pack(struct packing_data *pack,
				  struct object_entry *e,
				  struct packed_git *p)
{
	if (pack->in_pack_by_idx) {
		if (p->index) {
			e->in_pack_idx = p->index;
			return;
		}
		/*
		 * We're accessing packs by index, but this pack doesn't have
		 * an index (e.g., because it was added since we created the
		 * in_pack_by_idx array). Bail to oe_map_new_pack(), which
		 * will convert us to using the full in_pack array, and then
		 * fall through to our in_pack handling.
		 */
		oe_map_new_pack(pack);
	}
	pack->in_pack[e - pack->objects] = p;
}

By debugging, I discovered we are hitting the case that calls
oe_map_new_pack(pack). The documentation for that method provides
the following (**emphasis mine**):

/*
 * A new pack appears after prepare_in_pack_by_idx() has been
 * run. **This is likely a race.**
 *
 * We could map this new pack to in_pack_by_idx[] array, but then we
 * have to deal with full array anyway. And since it's hard to test
 * this fall back code, just stay simple and fall back to using
 * in_pack[] array.
 */
void oe_map_new_pack(struct packing_data *pack)

The issue being that prepare_packing_data() uses get_all_packs() to
get the list of pack-files, but that list is stale for some reason.
Adding a reprepare_packed_git() in advance of that call also removes
the flake (with always passing):

diff --git a/midx.c b/midx.c
index 9c26d04bfded..48db91d2728a 100644
--- a/midx.c
+++ b/midx.c
@@ -915,6 +915,7 @@ static void prepare_midx_packing_data(struct packing_data *pdata,
 	uint32_t i;
 
 	memset(pdata, 0, sizeof(struct packing_data));
+	reprepare_packed_git(the_repository);
 	prepare_packing_data(the_repository, pdata);
 
 	for (i = 0; i < ctx->entries_nr; i++) {

But this still appears like it is just a band-aid over a trickier
underlying issue.

Hopefully my rambling helps push you in a helpful direction to find
a more complete fix.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux