Re: [PATCH 03/23] pack-bitmap: bounds-check size of cache extension

Jeff King <peff@xxxxxxxx> · Thu, 12 Nov 2020 23:57:00 -0500

On Thu, Nov 12, 2020 at 06:47:09PM +0100, Martin Ågren wrote:

> > A .bitmap file may have a "name hash cache" extension, which puts a
> > sequence of uint32_t bytes (one per object) at the end of the file. When
> 
> s/bytes/values/, perhaps?

Yeah, definitely.

> > we see a flag indicating this extension, we blindly subtract the
> > appropriate number of bytes from our available length. However, if the
> > .bitmap file is too short, we'll underflow our length variable and wrap
> > around, thinking we have a very large length. This can lead to reading
> > out-of-bounds bytes while loading individual ewah bitmaps.
> 
> > +               uint32_t cache_size = st_mult(index->pack->num_objects, sizeof(uint32_t));
> 
> Hmm. If `sizeof(size_t)` is 8, then this multiplication can't possibly
> overflow. A huge value of `num_objects` (say, 0xffffffff) would give a
> huge return value (0xffffffff<<2) which would be truncated (0xfffffffc).
> I think?

Yeah, `cache_size` should absolutely be a `size_t`. If you have more
than a billion objects, obviously your cache is going to be bigger than
that. But most importantly, somebody can _claim_ to have a huge number
of objects and foil the size checks by wrapping around.

> Do we want a `u32_mult()`?

Nah, we should be doing this as a size_t in the first place. There are
similar problems with the .idx format, btw. I have a series to deal with
that which I've been meaning to post.

> > +               unsigned char *index_end = index->map + index->map_size - the_hash_algo->rawsz;
> 
> The addition should be ok or mmap has failed on us. Do we know that we
> have room for the final hash there so that the subtraction is ok? Yes,
> from the previous commit, we know we have room for the header, which is
> even larger. But that's cheating a bit -- see below.

Yeah, I agree this ought to be checking the minimum size against the
header _plus_ the trailer.

I think the previous patch is actually where it goes wrong. The original
was checking for a minimum of:

  if (index->map_size < sizeof(*header) + the_hash_algo->rawsz)

which is the header plus the trailer. We want to readjust for the
MAX_RAWSZ part of the header, so it should be:

  size_t header_size = sizeof(*header) - GIT_MAX_RAWSZ + the_hash_algo->rawsz;
  if (index->map_size < sizeof(*header) + the_hash_algo->rawsz)

> > +                       if (index->map + header_size + cache_size > index_end)
> > +                               return error("corrupted bitmap index file (too short to fit hash cache)");
> > +                       index->hashes = (void *)(index_end - cache_size);
> > +                       index_end -= cache_size;
> 
> If the header size we're adding is indeed too large, the addition in the
> check would be undefined behavior, if I'm not mistaken. In practical
> terms, with 32-bit pointers and a huge size, we'd wrap around, decide
> that everything is ok and go on to do the same erroneous subtraction as
> before.
> 
> Maybe shuffle a few things over from the left to the right to only make
> subtractions that we know are ok:
> 
>   if (cache_size > index_end - index->map - header_size)

Yes, I agree this should be done as a subtraction as you showed to avoid
integer overflow.

> But I don't think we can fully trust those subtractions. We're
> subtracting the size of two hashes (one in the header, one in the
> footer), but after the previous patch, we only know that there's room
> for one. So probably the previous patch could go
> 
>   +       /*
>   +        * Verify that we have room for the header and the
>   +        * trailing checksum hash, so we can safely subtract
>   +        * their sizes from map_size. We can afford to be
>   +        * a bit imprecise with the error message.
>   +        */
>   -       if (index->map_size < sizeof(*header) + the_hash_algo->rawsz)
>   +       if (index->map_size < header_size + the_hash_algo->rawsz)
> 
> I *think* I've got most of my comments here somewhat right, but I could
> easily have missed something.

Right. I think that's right, and the previous patch is just buggy.

-Peff