Re: [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Taylor, extremely thanks for finding the reason for this failure.

On Wed, Aug 17, 2022 at 3:28 AM Taylor Blau <me@xxxxxxxxxxxx> wrote:
>
> Hi Abhradeep,
>
> When I took a close look at the process to generate MIDX bitmaps, I found a
> couple of interesting things. The first more trivial fix is that we
> incorrectly propagate the "preferred"-ness bit from packs in an existing
> MIDX when generating a new one. If the identity of the preferred pack
> changes, we should not drag forward those bits on objects already known
> (and preferred) by the existing MIDX:
>
> --- >8 ---
>
> diff --git a/midx.c b/midx.c
> index 3ff6e91e6e..40e520534c 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -619,6 +619,9 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
>
>                 if (m) {
>                         uint32_t start = 0, end;
> +                       int orig_preferred_pack = -1;
> +                       if (0 <= preferred_pack && preferred_pack < m->num_packs)
> +                               orig_preferred_pack = info[preferred_pack].orig_pack_int_id;
>
>                         if (cur_fanout)
>                                 start = ntohl(m->chunk_oid_fanout[cur_fanout - 1]);
> @@ -629,7 +632,7 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
>                                 nth_midxed_pack_midx_entry(m,
>                                                            &entries_by_fanout[nr_fanout],
>                                                            cur_object);
> -                               if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack)
> +                               if (nth_midxed_pack_int_id(m, cur_object) == orig_preferred_pack)
>                                         entries_by_fanout[nr_fanout].preferred = 1;
>                                 else
>                                         entries_by_fanout[nr_fanout].preferred = 0;
>
> --- 8< ---

I am not able to understand this modification.
`info[preferred_pack].orig_pack_int_id` and `preferred_pack` have the
same value, right? I see `ctx.info` getting sorted only after calling
`get_sorted_entries()` function.

> But a more interesting problem arose when I took a closer look at the
> psuedo-pack order of objects generated according to
> `prepare_midx_packing_data()`. With Johannes' fixed $test_tick value, I
> was able to see the following in runs that succeeded:
>
>     27bb4ecd3e96cd0b3bc37d92a78cb5cbf34c418afa67f74cc52517ff7df418e1 (12 in pack-63c460f99a5c08f631396b1828c64006170a9d543b064506fd11b504a62acf52.idx)
>     c68154d69c19f010afce786c6debe926ae6e7decfb946a4549085a792cf9de7e (202 in pack-63c460f99a5c08f631396b1828c64006170a9d543b064506fd11b504a62acf52.idx)
>     a0b85b314ede46aa9f9b5796a284a4cf0b86ebb8fa32f87ae246e21b5378b11c (392 in pack-63c460f99a5c08f631396b1828c64006170a9d543b064506fd11b504a62acf52.idx)
>     [...]
>
> and the following in runs that failed:
>
>     46193a971f5045cb3ca6022957541f9ccddfbfe78591d8506e2d952f8113059b (221 in pack-3fc052de674e3d48096af7cc5125675c0ae1082aa798eb9358de357b2655f9ad.idx)
>     67df8a01ac84cf5f028855c48384eac3336bb02a52603bac285c4b31d66b3ab5 (12 in pack-2021cdedb33b542b244eacf3d009d1384471a53286b0c1235c91d124355dc818.idx)
>     1556b5f0ad7cb0c25a1fc47355fcffc00775e90d94ae8c511e5776b204796ce6 (200 in pack-2021cdedb33b542b244eacf3d009d1384471a53286b0c1235c91d124355dc818.idx)
>
> In the successful case, pack 63c460f99a... is preferred, and its objects
> appear in ascending order of their pack offsets. But in the other case,
> pack 3fc052de67... is preferred, but its first object starts at offset
> 221. Huh? That's not right:
>
>     $ git show-index <.git/objects/pack/pack-3fc052de674e3d48096af7cc5125675c0ae1082aa798eb9358de357b2655f9ad.idx
>     221 46193a971f5045cb3ca6022957541f9ccddfbfe78591d8506e2d952f8113059b (1f4bd28e)
>     12 4d332072f161629ffe4652ecd3ce377ef88447bec73f05ab0f3515f98bd061cf (fadf885b)
>
> Indeed, there is another object there at offset 12. Missing that object
> (since it comes from a preferred pack) is an invariant violation (since
> all objects from the preferred pack should be selected when multiple
> copies are available).
>
> It's missing because the existing MIDX selects that object from a
> different pack, and when we get to fanout 0x4d (the one which should
> include that object), we skip over seeing its copy in the preferred pack
> because that pack already appears in the existing MIDX, though it wasn't
> preferred.

ahh, now I understand what the problem was actually. Thanks :)

> I think there are a couple of ways to fix this. The easiest thing to do
> would be to force the identity of the preferred pack to be the same when
> generating a MIDX bitmap *while reusing an existing MIDX*, since that is
> the only time this bug can happen.
>
> But that's a little magical for my taste. I think a more reasonable fix
> would be to include copies of all objects from the preferred pack
> including in the case where that pack was non-preferred in an existing
> MIDX and at least one object in that pack was selected from a different
> pack in the existing MIDX.

I think the later approach makes the most sense to me. It might not be
a good idea to keep the same pack as `preferred` as a better candidate
would be ignored in that case.

> Abhradeep -- let me know if this is something you want to look into. I
> think it's a very worthwhile bug to fix, since it is definitely
> trigger-able in the wild (notably, only with `git multi-pack-index write
> --bitmap` without `--stdin-packs` and only under certain circumstances),
> and not just limited to SHA-256 mode.
>
> If you are busy experimenting with CRoaring, that's no problem and I can
> fix this up, too. Either way, it would be worth you and others weighing
> in on which fix you think is worth pursuing.

I will be happy to fix it but I can't work on it right now (neither on
CRoaring) because I am currently preparing for my exam. I can continue
my work after that (i.e. from 19 aug). If you feel it is getting too
late then you can do this too. I am also thinking of  writing a patch
for bitmap specific test dump tool (as Johannes proposed previously).

My exam dates are 18 Aug, 31 Aug, 1 Sep, 2 Sep and 3 Sep (I know the
dates are weird) The dates are adjusted on request for Smart India
Hackathon ( 24 Aug - 27 Aug).

Thanks :)



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux