Re: [PATCH 01/10] loose_object_info(): BUG() on inflating content with unknown type

Patrick Steinhardt <ps@xxxxxx> · Tue, 25 Feb 2025 12:42:22 +0100

On Tue, Feb 25, 2025 at 01:28:24AM -0500, Jeff King wrote:
> After unpack_loose_header() returns, it will have inflated not only the
> object header, but possibly some bytes of the object content. When we
> call unpack_loose_rest() to extract the actual content, it finds those
> extra bytes by skipping past the header's terminating NUL in the buffer.
> Like this:
> 
>   int bytes = strlen(buffer) + 1;
>   n = stream->total_out - bytes;
>   ...
>   memcpy(buf, (char *) buffer + bytes, n);
> 
> This won't work with the OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag, as there
> we allow a header of arbitrary size. We put into a strbuf, but feed only

s/into/it &/

> the final 32-byte chunk we read to unpack_loose_rest(). In that case
> stream->total_out may unexpectedly large, and thus our "n" will be

s/may/& be/

> large, causing an out-of-bounds read (we do check it against our
> allocated buffer size, which prevents an out-of-bounds write).
> 
> Probably this could be made to work by feeding the strbuf to
> unpack_loose_rest(), along with adjusting some types (e.g., "bytes"
> would need to be a size_t, since it is no longer operating on a 32-byte
> buffer).

I was a bit confused initially as I was thinking in terms of `size_t`
and `uint32_t` as I misread 32-byte for 32-bit, which is an immediate
shortcut that my brain took because 32 bit is something you read all the
time. I don't really have a great idea for how to introduce the byte
chunk better though to avoid this.

> Signed-off-by: Jeff King <peff@xxxxxxxx>
> ---
> I found this because I was tracing the code path after
> unpack_loose_header() returns to verify some assumptions in the other
> patches.
> 
> It really makes me wonder if this "unknown type" stuff has any value
> at all. You can create an object with any type using "hash-object
> --literally -t". And you can ask about its type and size. But you can
> never retrieve the object content! Nor can you pack it or transfer it,
> since packs use a numeric type field.
> 
> This code was added ~2015, but I don't think anybody built more on top
> of it. I wonder if we should just consider it a failed experiment and
> rip out the support.

I certainly do not know and cannot think of any usecase for this. I also
expect that a repository with unknown object types is a recipe for weird
edge cases in case they are being read somehow.

I guess we'll know a bit more when this patch series lands? In case
nobody complains it is another indicator that unknown object types
aren't being used out there in the wild.

>  object-file.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/object-file.c b/object-file.c
> index 00c3a4b910..45b251ba04 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1580,6 +1580,8 @@ static int loose_object_info(struct repository *r,
>  
>  		if (!oi->contentp)
>  			break;
> +		if (hdrbuf.len)
> +			BUG("unpacking content with unknown types not yet supported");
>  		*oi->contentp = unpack_loose_rest(&stream, hdr, *oi->sizep, oid);
>  		if (*oi->contentp)
>  			goto cleanup;

Okay. I was wondering whether we still need `hdrbuf`, but we of course
do in order to continue reading the type and length itself. The only
thing we restrict is reading the contents of such objects.

Patrick