On Fri, Apr 09, 2021 at 10:07:27AM +0200, Ævar Arnfjörð Bjarmason wrote: > As noted in the comment introduced in 837d395a5c0 (Replace > parse_blob() with an explanatory comment, 2010-01-18) the old > parse_blob() function and the current parse_blob_buffer() exist merely > to provide consistency in the API. > > We're not going to parse blobs like we "parse" commits, trees or > tags. So let's not have the parse_blob_buffer() take arguments that > pretends that we do. Its only use is to set the "parsed" flag. > > See bd2c39f58f9 ([PATCH] don't load and decompress objects twice with > parse_object(), 2005-05-06) for the introduction of parse_blob_buffer(). OK. Calling it parse_blob_buffer() is a little silly since it doesn't even take a buffer anymore. But I guess parse_blob() might imply that it actually loads the contents from disk to check them (which the other parse_foo() functions do), so that's not a good name. So this might be the least bad thing. Given that there are only two callers, just setting blob->object.parsed might not be unreasonable, either. But I don't think it's worth spending too much time on. > @@ -266,7 +266,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid) > error(_("hash mismatch %s"), oid_to_hex(oid)); > return NULL; > } > - parse_blob_buffer(lookup_blob(r, oid), NULL, 0); > + parse_blob_buffer(lookup_blob(r, oid)); > return lookup_object(r, oid); Not new in your patch, but I wondered if this could cause a segfault when lookup_blob() returns NULL. I _think_ the answer is "no". We'd hit this code path when either: - lookup_object() returns an object with type OBJ_BLOB, in which case lookup_blob() would return that same object - lookup_object() returned NULL, in which case lookup_blob() will call it again, get NULL again, and then auto-create the blob and return it So I think it is OK. But there are a bunch of duplicate hash lookups in this code. It would be clearer and more efficient as: diff --git a/object.c b/object.c index 2c32691dc4..2dfa038f13 100644 --- a/object.c +++ b/object.c @@ -262,12 +262,14 @@ struct object *parse_object(struct repository *r, const struct object_id *oid) if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) || (!obj && repo_has_object_file(r, oid) && oid_object_info(r, oid, NULL) == OBJ_BLOB)) { + if (!obj) + obj = create_object(r, oid, alloc_blob_node(r)); if (check_object_signature(r, repl, NULL, 0, NULL) < 0) { error(_("hash mismatch %s"), oid_to_hex(oid)); return NULL; } - parse_blob_buffer(lookup_blob(r, oid), NULL, 0); - return lookup_object(r, oid); + parse_blob_buffer(obj, NULL, 0); + return obj; } buffer = repo_read_object_file(r, oid, &type, &size); but I doubt the efficiency matters much in practice. Those hash lookups will be lost in the noise of computing the hash of the blob contents. -Peff