Re: [PATCH v6 22/22] fsck: report invalid object type-path combinations

Taylor Blau <me@xxxxxxxxxxxx> · Fri, 17 Sep 2021 00:06:02 -0400

On Tue, Sep 07, 2021 at 12:58:17PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Improve the error that's emitted in cases where we find a loose object
> we parse, but which isn't at the location we expect it to be.
>
> Before this change we'd prefix the error with a not-a-OID derived from
> the path at which the object was found, due to an emergent behavior in
> how we'd end up with an "OID" in these codepaths.
>
> Now we'll instead say what object we hashed, and what path it was
> found at. Before this patch series e.g.:
>
>     $ git hash-object --stdin -w -t blob </dev/null
>     e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
>     $ mv objects/e6/ objects/e7
>
> Would emit ("[...]" used to abbreviate the OIDs):
>
>     git fsck
>     error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
>     error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]
>
> Now we'll instead emit:
>
>     error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]

Lovely!

> @@ -603,20 +603,25 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
>  	struct strbuf sb = STRBUF_INIT;
>  	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
>  	struct object_info oi;
> +	struct object_id real_oid = *null_oid();
>  	int found = 0;
>  	oi.type_name = &sb;
>  	oi.sizep = &size;
>  	oi.typep = &type;
>
> -	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
> +	if (read_loose_object(path, oid, &real_oid, &contents, &oi, oi_flags) < 0) {
>  		found |= ERROR_OBJECT;
> -		error(_("%s: object corrupt or missing: %s"),
> -		      oid_to_hex(oid), path);
> +		if (!oideq(&real_oid, oid))
> +			error(_("%s: hash-path mismatch, found at: %s"),
> +			      oid_to_hex(&real_oid), path);
> +		else
> +			error(_("%s: object corrupt or missing: %s"),
> +			      oid_to_hex(oid), path);

Nice; this is the important part that this patch is changing, and the
logic is very nice. Before it read "anytime read_loose_object fails,
it's an error" to "it's still an error, but we can handle the case where
the real OID and the one we expected were different separately from
generic corruption".

>  	}
>  	if (type < 0) {
>  		found |= ERROR_OBJECT;
>  		error(_("%s: object is of unknown type '%s': %s"),
> -		      oid_to_hex(oid), sb.buf, path);
> +		      oid_to_hex(&real_oid), sb.buf, path);

Could go either way on this hunk, but I think that I err slightly on
your side now that we have access to the "real_oid".

The rest of the code and test changes in this patch look good to me.

Thanks,
Taylor