Re: [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 2, 2022 at 9:05 PM Johannes Schindelin
<Johannes.Schindelin@xxxxxx> wrote:
> > But I am sure it is not related to the lookup table implementation code.
> > Because when I swap the order of calling  `test_midx_bitmap_cases
> > "pack.writeBitmapLookupTable"` and `test_midx_bitmap_cases` (in
> > t5326-multi-pack-bitmaps.sh), in that case, the error is being generated
> > in  `test_midx_bitmap_cases` call. Generally speaking, the error is
> > always being generated in the second call.
>
> Indeed, it probably has something to do with the test tick (which gives
> rise to the author/committer date of the commits that are generated, and
> hence with the SHA order of said commits).
>
> With this patch:
>
> -- snip --
> diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
> index 3b206adcee6..a340f005b89 100755
> --- a/t/t5326-multi-pack-bitmaps.sh
> +++ b/t/t5326-multi-pack-bitmaps.sh
> @@ -347,7 +347,11 @@ test_midx_bitmap_cases () {
>         '
>  }
>
> -test_midx_bitmap_cases
> +# test_midx_bitmap_cases
> +
> +GIT_COMMITTER_DATE='1112928553 -0700'
> +GIT_AUTHOR_DATE='1112928553 -0700'
> +test_tick='1112928553'
>
>  test_midx_bitmap_cases "pack.writeBitmapLookupTable"
>
> -- snap --
>
> I can reproduce it quicker via
>
>         GIT_TEST_DEFAULT_HASH=sha256 sh t5326-*.sh --run=1,71,91,92,93,124,145
>
> Without setting those variables, I cannot skip the first
> `test_midx_bitmap_cases` invocation _and_ reproduce the failure.

Yeah, this works for me also.

> > I am still not able to figure out the problem here. But let me further
> > investigate.
> >
> > If anyone has some idea about what could be the culprit, I will be
> > very happy to know.
>
> So I noticed that the test will pass every 4th to 5th time over here,
> which means that it is a racy condition that is the culprit.

I also encountered the same and it blew my mind at first (because it
is the first race condition that I faced in my life) :)

> I dug a bit deeper and reduced the reproducer even further, by running
> this command with a trash directory just after above test script
> invocation failed:
>
>         bin-wrappers/git -C t/trash\ directory.t5326-multi-pack-bitmaps/ \
>                 -c pack.threads=1 pack-objects --revs --thin --stdout \
>                 --progress --delta-base-offset </tmp/a5 |
>         bin-wrappers/git -C t/trash\ directory.t5326-multi-pack-bitmaps/ \
>                 -c pack.threads=1 index-pack --stdin -v --fix-thin \
>                 '--keep=fetch-pack 12345 on labtop' \
>                 --check-self-contained-and-connected
>
> where `/tmp/a5` contains these lines:
>
> -- snip --
> 0ae5a358dcea86d81c0903aaec1e21857688cdb36c7fd89b04bd293fb2cceaa6
> 67df8a01ac84cf5f028855c48384eac3336bb02a52603bac285c4b31d66b3ab5
> 098a57f7753320c8a37cf0cb84526a9e50439d9f70fb673c91436a5283a7efe8
> --not
> -- snap --
>
> This allowed me to instrument the code with _many_ debug printf statements
> (I actually use `error("%s:d: ...", __FILE__, __LINE__, ...)` calls) to
> dive deeper into the weeds.
>
> One relatively obvious difference I can see is that when the code reaches
> builtin/pack-objects.c:1198, in the passing case after writing the reused
> pack we're at offset 900 in the written pack file, but in the failing case
> we're at offset 269.
>
> Another difference I first saw was that the mtime of
> `.git/objects/pack/multi-pack-index` was identical to the mtime of
> `.git/objects/pack/multi-pack-index-2ec3c30357d2fff78db9b36cc749b393087e989bffdd278771d6f62089406061.bitmap`
> in the failing case, while the mtimes of the corresponding files were
> different in the passing case.
>
> But in another failing run, the mtimes were also non-identical. Meaning:
> the race cannot be caused by identical or non-identical timestamps there.
>
> One consistent difference, however, was the SHA-256 in that `.bitmap` file
> name: In the failing case it was always
> 2ec3c30357d2fff78db9b36cc749b393087e989bffdd278771d6f62089406061, while in
> the succeeding case it was always
> 0c275657a915eeff1f2a1c17e5ded43cc3b232b0e178923e44fc15c1970516fb.
>
> My suspicion is that this `.bitmap` file is written out in an earlier test
> case, and is already incorrect at that stage. Maybe it should have been
> updated, but isn't, and the result is an incorrectly-reused partial pack
> file.

I agree with you.

> I also noticed that deleting the `multi-pack-index-*.bitmap` file in the
> failing case will "fix" the `pack-objects | index-pack` command I showed
> above.
>
> Hopefully this will help you dig in further because even if the bug is not
> in your code, it needs to be fixed. And I suspect that it is a bug in the
> code we already have in the main branch, so that fix is really, really
> needed, now.

Yeah, definitely! Thanks for all the information! It will truly help
me to identify the problem.

> Since you are very familiar with the details of bitmaps now, I would like
> to encourage you to work on some kind of validator/inspector, e.g.
> something along the lines of a `test-tool midx-bitmap dump` (and later
> `... verify`) that would help future you (and future me) investigate
> similar breakages. Ideally, that tool will not only parse the `.bitmap`
> file but immediately print out everything in a human-readable form.
>
> The reason I suggest this: I got a bit tired of staring at the output of
> `hexdump -C` and comparing it to the documentation in
> https://git-scm.com/docs/pack-format, so I had to stop after looking too
> long at one broken pack file (i.e. the output of the `pack-objects`
> command I showed above, where already the first entry seems to have an
> infinite delta chain that pretends that
> 4d332072f161629ffe4652ecd3ce377ef88447bec73f05ab0f3515f98bd061cf has
> itself as delta base) before I even could analyze the MIDX bitmap files.
>
> The proposed tool would make analyzing MIDX bitmaps substantially more
> fun, and would also help stave off future breakages if it was taught some
> `verify` mode that would essentially automate what right now has to be
> done manually: to verify that the MIDX bitmap file contents are sound and
> consistent with the contents of the pack files.
>
> Obviously, this `verify` command should be called in strategic places of
> t5326.

Ok, sure!

Thanks :)



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux