Re: [PATCH v6 21/22] fsck: report invalid types recorded in objects

Taylor Blau <me@xxxxxxxxxxxx> · Thu, 16 Sep 2021 23:57:42 -0400

On Tue, Sep 07, 2021 at 12:58:16PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Continue the work in the preceding commit and improve the error on:
>
>     $ git hash-object --stdin -w -t garbage --literally </dev/null
>     $ git fsck
>     error: hash mismatch for <OID_PATH> (expected <OID>)
>     error: <OID>: object corrupt or missing: <OID_PATH>
>     [ other fsck output ]
>
> To instead emit:
>
>     $ git fsck
>     error: <OID>: object is of unknown type 'garbage': <OID_PATH>
>     [ other fsck output ]
>
> The complaint about a "hash mismatch" was simply an emergent property
> of how we'd fall though from read_loose_object() into fsck_loose()
> when we didn't get the data we expected. Now we'll correctly note that
> the object type is invalid.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
> ---
>  builtin/fsck.c  | 22 ++++++++++++++++++----
>  object-file.c   | 13 +++++--------
>  object-store.h  |  4 ++--
>  t/t1450-fsck.sh | 24 +++++++++++++++++++++---
>  4 files changed, 46 insertions(+), 17 deletions(-)
>
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index 082dadd5629..07af0434db6 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -600,12 +600,26 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
>  	unsigned long size;
>  	void *contents;
>  	int eaten;
> -
> -	if (read_loose_object(path, oid, &type, &size, &contents,
> -			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
> -		errors_found |= ERROR_OBJECT;
> +	struct strbuf sb = STRBUF_INIT;
> +	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
> +	struct object_info oi;
> +	int found = 0;
> +	oi.type_name = &sb;
> +	oi.sizep = &size;
> +	oi.typep = &type;
> +
> +	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {

OK, now we pass a struct object_info instead of pointers to type and
size separately. Makes sense.

> +		found |= ERROR_OBJECT;

And found tracks the error we found when trying to read this loose
object, if any. Having a separate variable makes sense, since we only
want to avoid calling fsck_obj() if we found any errors for this object
while trying to call read_loose_object().

>  		error(_("%s: object corrupt or missing: %s"),
>  		      oid_to_hex(oid), path);
> +	}
> +	if (type < 0) {
> +		found |= ERROR_OBJECT;
> +		error(_("%s: object is of unknown type '%s': %s"),
> +		      oid_to_hex(oid), sb.buf, path);
> +	}
> +	if (found) {
> +		errors_found |= ERROR_OBJECT;

Perhaps errors_found |= found ?

>  		return 0; /* keep checking other objects */
>  	}
>
> diff --git a/object-file.c b/object-file.c
> index 0e6937fad73..f4850ba62b4 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -2560,9 +2560,8 @@ static int check_stream_oid(git_zstream *stream,
>
>  int read_loose_object(const char *path,
>  		      const struct object_id *expected_oid,
> -		      enum object_type *type,
> -		      unsigned long *size,
>  		      void **contents,
> +		      struct object_info *oi,
>  		      unsigned int oi_flags)

All of the changes in this function make perfect sense, except...
>  {
>  	int ret = -1;
> @@ -2570,10 +2569,9 @@ int read_loose_object(const char *path,
>  	unsigned long mapsize;
>  	git_zstream stream;
>  	char hdr[MAX_HEADER_LEN];
> -	struct object_info oi = OBJECT_INFO_INIT;
>  	int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
> -	oi.typep = type;
> -	oi.sizep = size;
> +	enum object_type *type = oi->typep;
> +	unsigned long *size = oi->sizep;

...I see that size is used in check_object_signature(), but I don't see
any uses for type. Am I missing it?

The tests look good to me.

Thanks,
Taylor