On Tue, Feb 25, 2025 at 01:28:24AM -0500, Jeff King wrote: > After unpack_loose_header() returns, it will have inflated not only the > object header, but possibly some bytes of the object content. When we > call unpack_loose_rest() to extract the actual content, it finds those > extra bytes by skipping past the header's terminating NUL in the buffer. > Like this: > > int bytes = strlen(buffer) + 1; > n = stream->total_out - bytes; > ... > memcpy(buf, (char *) buffer + bytes, n); > > This won't work with the OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag, as there > we allow a header of arbitrary size. We put into a strbuf, but feed only s/into/it &/ > the final 32-byte chunk we read to unpack_loose_rest(). In that case > stream->total_out may unexpectedly large, and thus our "n" will be s/may/& be/ > large, causing an out-of-bounds read (we do check it against our > allocated buffer size, which prevents an out-of-bounds write). > > Probably this could be made to work by feeding the strbuf to > unpack_loose_rest(), along with adjusting some types (e.g., "bytes" > would need to be a size_t, since it is no longer operating on a 32-byte > buffer). I was a bit confused initially as I was thinking in terms of `size_t` and `uint32_t` as I misread 32-byte for 32-bit, which is an immediate shortcut that my brain took because 32 bit is something you read all the time. I don't really have a great idea for how to introduce the byte chunk better though to avoid this. > Signed-off-by: Jeff King <peff@xxxxxxxx> > --- > I found this because I was tracing the code path after > unpack_loose_header() returns to verify some assumptions in the other > patches. > > It really makes me wonder if this "unknown type" stuff has any value > at all. You can create an object with any type using "hash-object > --literally -t". And you can ask about its type and size. But you can > never retrieve the object content! Nor can you pack it or transfer it, > since packs use a numeric type field. > > This code was added ~2015, but I don't think anybody built more on top > of it. I wonder if we should just consider it a failed experiment and > rip out the support. I certainly do not know and cannot think of any usecase for this. I also expect that a repository with unknown object types is a recipe for weird edge cases in case they are being read somehow. I guess we'll know a bit more when this patch series lands? In case nobody complains it is another indicator that unknown object types aren't being used out there in the wild. > object-file.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/object-file.c b/object-file.c > index 00c3a4b910..45b251ba04 100644 > --- a/object-file.c > +++ b/object-file.c > @@ -1580,6 +1580,8 @@ static int loose_object_info(struct repository *r, > > if (!oi->contentp) > break; > + if (hdrbuf.len) > + BUG("unpacking content with unknown types not yet supported"); > *oi->contentp = unpack_loose_rest(&stream, hdr, *oi->sizep, oid); > if (*oi->contentp) > goto cleanup; Okay. I was wondering whether we still need `hdrbuf`, but we of course do in order to continue reading the type and length itself. The only thing we restrict is reading the contents of such objects. Patrick