On Tue, Aug 2, 2022 at 9:05 PM Johannes Schindelin <Johannes.Schindelin@xxxxxx> wrote: > > But I am sure it is not related to the lookup table implementation code. > > Because when I swap the order of calling `test_midx_bitmap_cases > > "pack.writeBitmapLookupTable"` and `test_midx_bitmap_cases` (in > > t5326-multi-pack-bitmaps.sh), in that case, the error is being generated > > in `test_midx_bitmap_cases` call. Generally speaking, the error is > > always being generated in the second call. > > Indeed, it probably has something to do with the test tick (which gives > rise to the author/committer date of the commits that are generated, and > hence with the SHA order of said commits). > > With this patch: > > -- snip -- > diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh > index 3b206adcee6..a340f005b89 100755 > --- a/t/t5326-multi-pack-bitmaps.sh > +++ b/t/t5326-multi-pack-bitmaps.sh > @@ -347,7 +347,11 @@ test_midx_bitmap_cases () { > ' > } > > -test_midx_bitmap_cases > +# test_midx_bitmap_cases > + > +GIT_COMMITTER_DATE='1112928553 -0700' > +GIT_AUTHOR_DATE='1112928553 -0700' > +test_tick='1112928553' > > test_midx_bitmap_cases "pack.writeBitmapLookupTable" > > -- snap -- > > I can reproduce it quicker via > > GIT_TEST_DEFAULT_HASH=sha256 sh t5326-*.sh --run=1,71,91,92,93,124,145 > > Without setting those variables, I cannot skip the first > `test_midx_bitmap_cases` invocation _and_ reproduce the failure. Yeah, this works for me also. > > I am still not able to figure out the problem here. But let me further > > investigate. > > > > If anyone has some idea about what could be the culprit, I will be > > very happy to know. > > So I noticed that the test will pass every 4th to 5th time over here, > which means that it is a racy condition that is the culprit. I also encountered the same and it blew my mind at first (because it is the first race condition that I faced in my life) :) > I dug a bit deeper and reduced the reproducer even further, by running > this command with a trash directory just after above test script > invocation failed: > > bin-wrappers/git -C t/trash\ directory.t5326-multi-pack-bitmaps/ \ > -c pack.threads=1 pack-objects --revs --thin --stdout \ > --progress --delta-base-offset </tmp/a5 | > bin-wrappers/git -C t/trash\ directory.t5326-multi-pack-bitmaps/ \ > -c pack.threads=1 index-pack --stdin -v --fix-thin \ > '--keep=fetch-pack 12345 on labtop' \ > --check-self-contained-and-connected > > where `/tmp/a5` contains these lines: > > -- snip -- > 0ae5a358dcea86d81c0903aaec1e21857688cdb36c7fd89b04bd293fb2cceaa6 > 67df8a01ac84cf5f028855c48384eac3336bb02a52603bac285c4b31d66b3ab5 > 098a57f7753320c8a37cf0cb84526a9e50439d9f70fb673c91436a5283a7efe8 > --not > -- snap -- > > This allowed me to instrument the code with _many_ debug printf statements > (I actually use `error("%s:d: ...", __FILE__, __LINE__, ...)` calls) to > dive deeper into the weeds. > > One relatively obvious difference I can see is that when the code reaches > builtin/pack-objects.c:1198, in the passing case after writing the reused > pack we're at offset 900 in the written pack file, but in the failing case > we're at offset 269. > > Another difference I first saw was that the mtime of > `.git/objects/pack/multi-pack-index` was identical to the mtime of > `.git/objects/pack/multi-pack-index-2ec3c30357d2fff78db9b36cc749b393087e989bffdd278771d6f62089406061.bitmap` > in the failing case, while the mtimes of the corresponding files were > different in the passing case. > > But in another failing run, the mtimes were also non-identical. Meaning: > the race cannot be caused by identical or non-identical timestamps there. > > One consistent difference, however, was the SHA-256 in that `.bitmap` file > name: In the failing case it was always > 2ec3c30357d2fff78db9b36cc749b393087e989bffdd278771d6f62089406061, while in > the succeeding case it was always > 0c275657a915eeff1f2a1c17e5ded43cc3b232b0e178923e44fc15c1970516fb. > > My suspicion is that this `.bitmap` file is written out in an earlier test > case, and is already incorrect at that stage. Maybe it should have been > updated, but isn't, and the result is an incorrectly-reused partial pack > file. I agree with you. > I also noticed that deleting the `multi-pack-index-*.bitmap` file in the > failing case will "fix" the `pack-objects | index-pack` command I showed > above. > > Hopefully this will help you dig in further because even if the bug is not > in your code, it needs to be fixed. And I suspect that it is a bug in the > code we already have in the main branch, so that fix is really, really > needed, now. Yeah, definitely! Thanks for all the information! It will truly help me to identify the problem. > Since you are very familiar with the details of bitmaps now, I would like > to encourage you to work on some kind of validator/inspector, e.g. > something along the lines of a `test-tool midx-bitmap dump` (and later > `... verify`) that would help future you (and future me) investigate > similar breakages. Ideally, that tool will not only parse the `.bitmap` > file but immediately print out everything in a human-readable form. > > The reason I suggest this: I got a bit tired of staring at the output of > `hexdump -C` and comparing it to the documentation in > https://git-scm.com/docs/pack-format, so I had to stop after looking too > long at one broken pack file (i.e. the output of the `pack-objects` > command I showed above, where already the first entry seems to have an > infinite delta chain that pretends that > 4d332072f161629ffe4652ecd3ce377ef88447bec73f05ab0f3515f98bd061cf has > itself as delta base) before I even could analyze the MIDX bitmap files. > > The proposed tool would make analyzing MIDX bitmaps substantially more > fun, and would also help stave off future breakages if it was taught some > `verify` mode that would essentially automate what right now has to be > done manually: to verify that the MIDX bitmap file contents are sound and > consistent with the contents of the pack files. > > Obviously, this `verify` command should be called in strategic places of > t5326. Ok, sure! Thanks :)