Re: [PATCH v10 17/17] fsck: report invalid object type-path combinations

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Sat, 02 Oct 2021 00:33:40 +0200

On Fri, Oct 01 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@xxxxxxxxx> writes:
>
>> diff --git a/builtin/fsck.c b/builtin/fsck.c
>> index 260210bf8a1..30a516da29e 100644
>> --- a/builtin/fsck.c
>> +++ b/builtin/fsck.c
>> @@ -615,12 +616,18 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
>>  	oi.sizep = &size;
>>  	oi.typep = &type;
>>  
>> -	if (read_loose_object(path, oid, &contents, &oi) < 0)
>> -		err = error(_("%s: object corrupt or missing: %s"),
>> -			    oid_to_hex(oid), path);
>> +	if (read_loose_object(path, oid, &real_oid, &contents, &oi) < 0) {
>> +		if (contents && !oideq(&real_oid, oid))
>> +			err = error(_("%s: hash-path mismatch, found at: %s"),
>> +				    oid_to_hex(&real_oid), path);
>> +		else
>> +			err = error(_("%s: object corrupt or missing: %s"),
>> +				    oid_to_hex(oid), path);
>> +	}
>>  	if (type != OBJ_NONE && type < 0)
>>  		err = error(_("%s: object is of unknown type '%s': %s"),
>> -			    oid_to_hex(oid), cb_data->obj_type.buf, path);
>> +			    oid_to_hex(&real_oid), cb_data->obj_type.buf,
>> +			    path);
>>  	if (err < 0) {
>>  		errors_found |= ERROR_OBJECT;
>>  		return 0; /* keep checking other objects */
>
> When we say "hash-path mismatch", we would have non-null contents,
> presumably obtained from read_loose_object().  err is made negative
> when we give that messge, and we come here to return.  Did we forget
> to free "contents" in that case?

No, e.g. the "cat-file -t and -s on corrupt loose object" test added in
this series doesn't error with SANITIZE=leak.

This is because as we go through read_loose_object() we'll make our way
to unpack_loose_rest(), which will return that malloc'd buffer. So we
would leak it if we returned after that.

Except that in read_loose_object() we'll go on to call
check_object_signature() right afterwards. The expecte OID is whatever
we inferred from the FS path, and the OID we saw is what we get from
hashing. That call will return non-zero, and we'll free() the
contents. The buffer isn't NULL'd, but we can't use it.

This is all behavior that pre-dates this series. I think it's a bit
stupid, and we should arguably do better about data recovery here, as
alluded to at the end of the commit message.

I.e. ideally we can use the information that we know we wanted OID A,
who cares if we found it at path B? It hashes to A and completes the
graph! Let's just re-write it to A. Or maybe it's not worth it. Or we'd
want to optionally log the content we *do* find on such failures,
e.g. maybe the content is partial or whatever. I had some WIP work on
top of this that did that, e.g. to recover in cases where you append
garbage data at the end of an object (in which case we *do* have the
content and can recover, we just need to stop reading at that byte once
our OID matches, and re-write it out again).

But anyway, it doesn't work that way now, and this doesn't leak memory,
or as far as I can tell do the wrong thing in these various edge cases,
because "content is bad" is always synonymous with read_loose_object()
itself calling free().

Thanks a lot for the careful checking!