Jeff King <peff@xxxxxxxx> writes: > 4b. But sometimes we hit a different error. If another object points > to X as a delta base, then trying to find the type of that object > requires walking the delta chain to the base entry (since only the > base has the concrete type; deltas themselves are either OFS_DELTA > or REF_DELTA). > > Normally this would not require separate offset lookups at all, as > deltas are usually stored as OFS_DELTA, specifying the relative > offset to the base. But the corrupt idx created in step 1 is done > directly with "git pack-objects" and does not pass the > --delta-base-offset option, meaning we have REF_DELTA entries! > Those do have to consult an index to find the location of the base > object, and they use the pack .idx to do this. The same pack .idx > that we know is corrupted from step 1! Tricky. > The set of objects created in the test is deterministic. But the delta > selection seems not to be (which is not too surprising, as it is > multi-threaded). So, the offsets of the objects are also not deterministic? > I have seen the failure in Windows CI but haven't > reproduced it locally (not even with --stress). Re-running a failed > Windows CI job tends to work. But when I download and examine the trash > directory from a failed run, it shows a different set of deltas than I > get locally. But the exact source of non-determinism isn't that > important; our test should be robust against any order. Yeah, thanks for digging this tricky situation through. > b. The "objects64" setup could use --delta-base-offset. This would fix > our problem, but earlier tests have many hard-coded offsets. Using > OFS_DELTA would change the locations of objects in the pack (this > might even be OK because I think most of the offsets are within the > .idx file, but it seems brittle and I'm afraid to touch it). I am not sure I follow, as it does not sound a solution to anything if the offsets are not deterministic (and "earlier tests" that have hard coded offsets are broken no matter what, which is not a problem this patch introduces). Puzzled, but not curious enough to think about it further, as you have already rejected this approach ;-) > d. We could ask directly about object X, rather than enumerating all > of them. But that requires further hard-coding of the oid (both > sha1 and sha256) of object X. I'd prefer not to introduce more > brittleness. Right. > e. We can use a --batch-check format that looks at the pack data, but > doesn't have to chase deltas. The problem in this case is > %(objecttype), which has to walk to the base. But %(objectsize) > does not; we can get the value directly from the delta itself. > Another option would be %(deltabase), where we report the REF_DELTA > name but don't look at its data. > > I've gone with option (e) here. It's kind of subtle, but it's simple and > has no side effects. Nice. > @@ -1129,8 +1129,10 @@ test_expect_success 'reader bounds-checks large offset table' ' > git multi-pack-index --object-dir=../objects64 write && > midx=../objects64/pack/multi-pack-index && > corrupt_chunk_file $midx LOFF clear && > - test_must_fail git cat-file \ > - --batch-check --batch-all-objects 2>err && > + # using only %(objectsize) is important here; see the commit > + # message for more details > + test_must_fail git cat-file --batch-all-objects \ > + --batch-check="%(objectsize)" 2>err && A rather unfriendly message to readers, as it is unclear which commit you are talking about, and a fun thing is that you cannot say which one. Will queue. Thanks.