> Start the libification of parse_loose_header() by making it return > error codes and data instead of invoking die() by itself. For now > we'll move the relevant die() call to loose_object_info() and > read_loose_object() to keep this change smaller, but in subsequent > commits we'll also libify those. > > The reason this makes sense is that with the refactoring of > parse_loose_header_extended() in an earlier commit the public > interface for parse_loose_header() no longer just accepts a "unsigned > long *sizep". Rather it accepts a "struct object_info *", that > structure will be populated with information about the object. > > It thus makes sense to further libify the interface so that it stops > calling die() when it encounters OBJ_BAD, and instead rely on its > callers to check the populated "oi->typep". > > This also allows us to simplify away the > unpack_loose_header_to_strbuf() function added in > 46f034483eb (sha1_file: support reading from a loose object of unknown > type, 2015-05-03). Its code was mostly copy/pasted between it and both > of unpack_loose_header() and unpack_loose_short_header(). We now have > a single unpack_loose_header() function which accepts an optional > "struct strbuf *" instead. > > I think the remaining unpack_loose_header() function could be further > simplified, we're carrying some complexity just to be able to emit a > garbage type longer than MAX_HEADER_LEN, we could alternatively just > say "we found a garbage type <first 32 bytes>..." instead, but let's > leave this in place for now. Looking at the patch itself, this patch: 1. Combines unpack_loose_header(), unpack_loose_short_header(), and unpack_loose_header_to_strbuf() into one function. It does different things depending on whether the struct strbuf * is provided. 2. Updates parse_loose_header() to: a. never die upon invalid object type b. not accept a flags argument (which was used solely to control whether it died or not upon invalid object type) 3. Updates the callers of these functions. I think 2b should have been mentioned in the commit message. (And overall I think that the commit message could be shorter, but what information to include and exclude is a subjective matter, so I won't concern myself so much about that.) Also, I think that 1 should be split into its own commit. As it is, I'm not even sure if it's a good idea - for me, it is confusing for a function to consume or not consume more of the stream depending on whether an argument is NULL or not. > /* > * Check if entire header is unpacked in the first iteration. > */ > if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer)) > return 0; > > + /* > + * We have a header longer than MAX_HEADER_LEN. We abort early > + * unless under we're running as e.g. "cat-file > + * --allow-unknown-type". > + */ > + if (!header) > + return -1; What do you mean by "unless under we're running as"? And how would we know at this point that we're running as "cat-file --allow-unknown-type" just by checking "header"? > - if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) { > - if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0) > - status = error(_("unable to unpack %s header with --allow-unknown-type"), > - oid_to_hex(oid)); > - } else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) { > + > + hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr), > + allow_unknown ? &hdrbuf : NULL); > + if (hdr_ret < 0) { > status = error(_("unable to unpack %s header"), > oid_to_hex(oid)); > } This hunk would go into the split commit (for the unpack_loose_header() refactoring) I suggested above. > - > - if (status < 0) { > - /* Do nothing */ > - } else if (hdrbuf.len) { > - if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0) > - status = error(_("unable to parse %s header with --allow-unknown-type"), > - oid_to_hex(oid)); > - } else if ((status = parse_loose_header(hdr, oi, flags)) < 0) { > + if (!status && parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi) < 0) { > status = error(_("unable to parse %s header"), oid_to_hex(oid)); > } > + if (!allow_unknown && *oi->typep < 0) > + die(_("invalid object type")); I think this change doesn't need to be so big? The oi->typep check could just go in the "else if" branch that happens if --allow-unknown-type is not set. > diff --git a/object-store.h b/object-store.h > index d443964447..740edcac30 100644 > --- a/object-store.h > +++ b/object-store.h > @@ -477,11 +477,30 @@ int for_each_object_in_pack(struct packed_git *p, > int for_each_packed_object(each_packed_object_fn, void *, > enum for_each_object_flags flags); > > +/** > + * unpack_loose_header() initializes the data stream needed to unpack > + * a loose object header. > + * > + * Returns 0 on success. Returns negative values on error. > + * > + * It will only parse up to MAX_HEADER_LEN bytes unless an optional > + * "hdrbuf" argument is non-NULL. This is intended for use with > + * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error) > + * reporting. The full header will be extracted to "hdrbuf" for use > + * with parse_loose_header(). > + */ > int unpack_loose_header(git_zstream *stream, unsigned char *map, > unsigned long mapsize, void *buffer, > + unsigned long bufsiz, struct strbuf *hdrbuf); Parsing up to MAX_HEADER_LEN only occasionally is confusing. Could we always make it parse up to MAX_HEADER_LEN, and then put a truncated header in hdrbuf if it is too big (and therefore invalid)? > +/** > + * parse_loose_header() parses the starting "<type> <len>\0" of an > + * object. If it doesn't follow that format -1 is returned. To check > + * the validity of the <type> populate the "typep" in the "struct > + * object_info". It will be OBJ_BAD if the object type is unknown. > + */ > +int parse_loose_header(const char *hdr, struct object_info *oi); Also mention what happens if the size is invalid.