Hello Taylor, extremely thanks for finding the reason for this failure. On Wed, Aug 17, 2022 at 3:28 AM Taylor Blau <me@xxxxxxxxxxxx> wrote: > > Hi Abhradeep, > > When I took a close look at the process to generate MIDX bitmaps, I found a > couple of interesting things. The first more trivial fix is that we > incorrectly propagate the "preferred"-ness bit from packs in an existing > MIDX when generating a new one. If the identity of the preferred pack > changes, we should not drag forward those bits on objects already known > (and preferred) by the existing MIDX: > > --- >8 --- > > diff --git a/midx.c b/midx.c > index 3ff6e91e6e..40e520534c 100644 > --- a/midx.c > +++ b/midx.c > @@ -619,6 +619,9 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, > > if (m) { > uint32_t start = 0, end; > + int orig_preferred_pack = -1; > + if (0 <= preferred_pack && preferred_pack < m->num_packs) > + orig_preferred_pack = info[preferred_pack].orig_pack_int_id; > > if (cur_fanout) > start = ntohl(m->chunk_oid_fanout[cur_fanout - 1]); > @@ -629,7 +632,7 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, > nth_midxed_pack_midx_entry(m, > &entries_by_fanout[nr_fanout], > cur_object); > - if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack) > + if (nth_midxed_pack_int_id(m, cur_object) == orig_preferred_pack) > entries_by_fanout[nr_fanout].preferred = 1; > else > entries_by_fanout[nr_fanout].preferred = 0; > > --- 8< --- I am not able to understand this modification. `info[preferred_pack].orig_pack_int_id` and `preferred_pack` have the same value, right? I see `ctx.info` getting sorted only after calling `get_sorted_entries()` function. > But a more interesting problem arose when I took a closer look at the > psuedo-pack order of objects generated according to > `prepare_midx_packing_data()`. With Johannes' fixed $test_tick value, I > was able to see the following in runs that succeeded: > > 27bb4ecd3e96cd0b3bc37d92a78cb5cbf34c418afa67f74cc52517ff7df418e1 (12 in pack-63c460f99a5c08f631396b1828c64006170a9d543b064506fd11b504a62acf52.idx) > c68154d69c19f010afce786c6debe926ae6e7decfb946a4549085a792cf9de7e (202 in pack-63c460f99a5c08f631396b1828c64006170a9d543b064506fd11b504a62acf52.idx) > a0b85b314ede46aa9f9b5796a284a4cf0b86ebb8fa32f87ae246e21b5378b11c (392 in pack-63c460f99a5c08f631396b1828c64006170a9d543b064506fd11b504a62acf52.idx) > [...] > > and the following in runs that failed: > > 46193a971f5045cb3ca6022957541f9ccddfbfe78591d8506e2d952f8113059b (221 in pack-3fc052de674e3d48096af7cc5125675c0ae1082aa798eb9358de357b2655f9ad.idx) > 67df8a01ac84cf5f028855c48384eac3336bb02a52603bac285c4b31d66b3ab5 (12 in pack-2021cdedb33b542b244eacf3d009d1384471a53286b0c1235c91d124355dc818.idx) > 1556b5f0ad7cb0c25a1fc47355fcffc00775e90d94ae8c511e5776b204796ce6 (200 in pack-2021cdedb33b542b244eacf3d009d1384471a53286b0c1235c91d124355dc818.idx) > > In the successful case, pack 63c460f99a... is preferred, and its objects > appear in ascending order of their pack offsets. But in the other case, > pack 3fc052de67... is preferred, but its first object starts at offset > 221. Huh? That's not right: > > $ git show-index <.git/objects/pack/pack-3fc052de674e3d48096af7cc5125675c0ae1082aa798eb9358de357b2655f9ad.idx > 221 46193a971f5045cb3ca6022957541f9ccddfbfe78591d8506e2d952f8113059b (1f4bd28e) > 12 4d332072f161629ffe4652ecd3ce377ef88447bec73f05ab0f3515f98bd061cf (fadf885b) > > Indeed, there is another object there at offset 12. Missing that object > (since it comes from a preferred pack) is an invariant violation (since > all objects from the preferred pack should be selected when multiple > copies are available). > > It's missing because the existing MIDX selects that object from a > different pack, and when we get to fanout 0x4d (the one which should > include that object), we skip over seeing its copy in the preferred pack > because that pack already appears in the existing MIDX, though it wasn't > preferred. ahh, now I understand what the problem was actually. Thanks :) > I think there are a couple of ways to fix this. The easiest thing to do > would be to force the identity of the preferred pack to be the same when > generating a MIDX bitmap *while reusing an existing MIDX*, since that is > the only time this bug can happen. > > But that's a little magical for my taste. I think a more reasonable fix > would be to include copies of all objects from the preferred pack > including in the case where that pack was non-preferred in an existing > MIDX and at least one object in that pack was selected from a different > pack in the existing MIDX. I think the later approach makes the most sense to me. It might not be a good idea to keep the same pack as `preferred` as a better candidate would be ignored in that case. > Abhradeep -- let me know if this is something you want to look into. I > think it's a very worthwhile bug to fix, since it is definitely > trigger-able in the wild (notably, only with `git multi-pack-index write > --bitmap` without `--stdin-packs` and only under certain circumstances), > and not just limited to SHA-256 mode. > > If you are busy experimenting with CRoaring, that's no problem and I can > fix this up, too. Either way, it would be worth you and others weighing > in on which fix you think is worth pursuing. I will be happy to fix it but I can't work on it right now (neither on CRoaring) because I am currently preparing for my exam. I can continue my work after that (i.e. from 19 aug). If you feel it is getting too late then you can do this too. I am also thinking of writing a patch for bitmap specific test dump tool (as Johannes proposed previously). My exam dates are 18 Aug, 31 Aug, 1 Sep, 2 Sep and 3 Sep (I know the dates are weird) The dates are adjusted on request for Smart India Hackathon ( 24 Aug - 27 Aug). Thanks :)