On Fri, Nov 13, 2020 at 06:03:24PM -0500, Jeff King wrote: > But what's slightly disturbing is this output: > > > --- expect 2020-11-13 22:20:39.246355100 +0000 > > +++ actual 2020-11-13 22:20:39.254355294 +0000 > > @@ -1 +1 @@ > > -239 > > +236 > > error: last command exited with $?=1 > > We're actually producing the wrong answer here, which implies that > ewah_read_mmap() is not being careful enough. Or possibly we are feeding > it extra bytes (e.g., letting it run over into the name-hash cache or > into the trailer checksum). > > I think we'll have to dig further into this, probably running the sha256 > case in a debugger to see what offsets we actually end up reading. Yep, the problem is in the caller, which is not careful about size checks before reading the header before the actual ewah. The first hunk here fixes it (the second is just another possible corruption I noticed, but not triggered by the test): diff --git a/pack-bitmap.c b/pack-bitmap.c index dc811ebae8..785009b04e 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -229,11 +229,16 @@ static int load_bitmap_entries_v1(struct bitmap_index *index) uint32_t commit_idx_pos; struct object_id oid; + if (index->map_size - index->map_pos < 6) + return error("corrupt ewah bitmap: truncated header for entry %d", i); + commit_idx_pos = read_be32(index->map, &index->map_pos); xor_offset = read_u8(index->map, &index->map_pos); flags = read_u8(index->map, &index->map_pos); - nth_packed_object_id(&oid, index->pack, commit_idx_pos); + if (nth_packed_object_id(&oid, index->pack, commit_idx_pos) < 0) + return error("corrupt ewah bitmap: commit index %u out of range", + (unsigned)commit_idx_pos); bitmap = read_bitmap_1(index); if (!bitmap) We should definitely do something like this, but there are some possible further improvements: - I think that map_size includes the trailing hash, and almost certainly any post-index extensions. We could probably compute the correct boundary of the bitmaps themselves in the caller and make sure we don't read past it. I'm not sure if it's worth the effort, though. In a truncation situation, basically all bets are off (is the trailer still there and the bitmap entries malformed, or is the trailer truncated?). The best we can do is try to read what's there as if it's correct data (and protect ourselves when it's obviously bogus). - we could avoid the magic "6" if read_be32() and read_u8(), which are custom helpers for this function, checked sizes before advancing the pointers. - I'm hesitant to add more tests in this area. As you can see from the commit which "broke" the test, truncating at byte N is going to be sensitive to small variations in the bitmap generation. So unless we're actually parsing the bitmaps and doing targeted corruptions, the tests will be somewhat brittle. -Peff