On Mon, Aug 26, 2019 at 10:20:20AM -0700, Junio C Hamano wrote: > Stefan Sperling <stsp@xxxxxxxxx> writes: > > > The root cause of this bug seems to be that the valid assumption > > that obj->parsed implies a successfully parsed object is broken by > > parse_tag_buffer() because this function sets the 'parsed' flag even > > if errors occur during parsing. > > I am mildly negative about that approach. obj->parsed is about > "we've done all we need to do to attempt parsing this object" (so > that next person who gets hold of the object knows that fact---one > of the reasons why may be that the caller who wants to ensure that > the fields are ready to be accessed does not have to spend extra > cycles, but that is not the only one). Those that want to look at > various fields in the object (e.g. the tagged object of a tag, the > tagger identity of a tag, etc.) should be prepared to see and react > to NULL in there so that they can gracefully handle "slightly" > corrupt objects. It seems like the right place to notice "we did not parse correctly" is an error return from parse_tag_buffer(). We're not calling it ourselves in this instance, but it looks like it does get propagated from parse_object(), which would yield NULL. I wonder if some earlier caller in checkout/archive is ignoring a parse failure, and continuing to work with the object anyway. Avoiding setting the parse flag is a cheap way to make sure that the later calls re-attempt the parse and notice the error themselves. That wastes some work in the case of a bogus tag, but callers who want to view the corrupted state aren't really any worse off. That said, the error condition touched by Stefan's updated patch is not sufficient to guarantee that tag->tagged is non-NULL (whether we detect the error case by return code or by lack of "parsed" flag). The code does this: if (!strcmp(type, blob_type)) { item->tagged = (struct object *)lookup_blob(r, &oid); } else if (!strcmp(type, tree_type)) { item->tagged = (struct object *)lookup_tree(r, &oid); } else if (!strcmp(type, commit_type)) { item->tagged = (struct object *)lookup_commit(r, &oid); } else if (!strcmp(type, tag_type)) { item->tagged = (struct object *)lookup_tag(r, &oid); } else { error("Unknown type %s", type); item->tagged = NULL; } Any of those lookup_* functions may also return. It's relatively rare, since we don't actually confirm the type against the object database at that time. But it can happen if the same program already saw that particular oid as another type. This is tricky to trigger for checkout/archive because they generally parse the tag first (but not impossible; e.g., some config like mailmap.blob may read objects early). But anything using the revision parser is happy to read multiple objects. If we want to cover all cases, probably something like: if (!item->tagged) ret = -1; would be simplest. -Peff