Re: [PATCH v3 3/4] object-file: emit corruption errors when detected

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 08 2022, Jonathan Tan wrote:

> Instead of relying on errno being preserved across function calls, teach
> do_oid_object_info_extended() to itself report object corruption when
> it first detects it. There are 3 types of corruption being detected:
>  - when a replacement object is missing
>  - when a loose object is corrupt
>  - when a packed object is corrupt and the object cannot be read
>    in another way
>
> Note that in the RHS of this patch's diff, a check for ENOENT that was
> introduced in 3ba7a06552 (A loose object is not corrupt if it cannot
> be read due to EMFILE, 2010-10-28) is also removed. The purpose of this
> check is to avoid a false report of corruption if the errno contains
> something like EMFILE (or anything that is not ENOENT), in which case
> a more generic report is presented. Because, as of this patch, we no
> longer rely on such a heuristic to determine corruption, but surface
> the error message at the point when we read something that we did not
> expect, this check is no longer necessary.
>
> Besides being more resilient, this also prepares for a future patch in
> which an indirect caller of do_oid_object_info_extended() will need
> such functionality.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx>
> ---
>  object-file.c  | 48 ++++++++++++++++++++++--------------------------
>  object-store.h |  3 +++
>  2 files changed, 25 insertions(+), 26 deletions(-)
>
> diff --git a/object-file.c b/object-file.c
> index d99d05839f..f166065f32 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1433,6 +1433,7 @@ static int loose_object_info(struct repository *r,
>  {
>  	int status = 0;
>  	unsigned long mapsize;
> +	const char *path = NULL;

I think the NULL assignment here should either go, or it's incomplete.

Below you chechk the return value of "map", so let's either trust it to
populate "path" if it's returning success (which it does), *or* not
trust it, init this to NULL, and below add....

>  	void *map;
>  	git_zstream stream;
>  	char hdr[MAX_HEADER_LEN];
> @@ -1464,7 +1465,7 @@ static int loose_object_info(struct repository *r,
>  		return 0;
>  	}
>  
> -	map = map_loose_object(r, oid, &mapsize);
> +	map = map_loose_object_1(r, oid, &mapsize, &path);
>  	if (!map)
>  		return -1;

....

	if (!path)
		BUG("map_loose_object_1 should have given us a path");

But I don't think it's good to just hide the potential difference
between the two, especially as....

> @@ -1502,6 +1503,10 @@ static int loose_object_info(struct repository *r,
>  		break;
>  	}
>  
> +	if (status && (flags & OBJECT_INFO_DIE_IF_CORRUPT))
> +		die(_("loose object %s (stored in %s) is corrupt"),
> +		    oid_to_hex(oid), path);

...I think this is leftover from a previous round (or maybe not, I
didn't check) where we did a free(path), but here we'd end up
segfaulting (not on glibc, but some platforms) if we have a NULL path.

So init-ing it didn't help us, but just helps to hide that potential
(and much worse) bug.

I think this change should also remove the existing "const char *path"
in this function from the "if"'d scope omitted in this context.

The C compiler won't care, but to the human reader it's easier to reason
about not shadowing the variable now, for as it turns out no reason, as
they're effectively independent.

>  	git_inflate_end(&stream);
>  cleanup:
>  	munmap(map, mapsize);
> @@ -1611,6 +1616,15 @@ static int do_oid_object_info_extended(struct repository *r,
>  			continue;
>  		}
>  
> +		if (flags & OBJECT_INFO_DIE_IF_CORRUPT) {
> +			const struct packed_git *p;

Nit: add an extra \n here, between decls and code.

> -	if (oid_object_info_extended(r, oid, &oi, 0) < 0)
> +	if (oid_object_info_extended(r, oid, &oi,
> +				     die_if_corrupt ? OBJECT_INFO_DIE_IF_CORRUPT : 0)
> +	    < 0)
>  		return NULL;
>  	return content;
>  }

This is a very odd coding style/wrapping, to not even end up with a line
shorter than 79 characters. You can instead do:

	if (oid_object_info_extended(r, oid, &oi, die_if_corrupt
				     ? OBJECT_INFO_DIE_IF_CORRUPT : 0) < 0)

Which is in line with our usual style, and does wrap before 79 characters...

> diff --git a/object-store.h b/object-store.h
> index b1ec0bde82..98c1d67946 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -445,6 +445,9 @@ struct object_info {
>   */
>  #define OBJECT_INFO_FOR_PREFETCH (OBJECT_INFO_SKIP_FETCH_OBJECT | OBJECT_INFO_QUICK)
>  
> +/* Die if object corruption (not just an object being missing) was detected. */
> +#define OBJECT_INFO_DIE_IF_CORRUPT 32

Personally I wouldn't mind a short cleanup step in this series to change
these to 1<<0, 1<<1 etc., as we do for almost everything els.

I.e. in an earlier step you removed the "16", and changed that "32" to
"16", now we're adding a "32" again.

I also notice that you didn't just add a "4" here, which is an existing
gap, which turns out to be a leftover bit from your 9c8a294a1ae
(sha1-file: remove OBJECT_INFO_SKIP_CACHED, 2020-01-02) ~2 years ago :)

Maybe it's too much, and we could do it later, but something like this
as a first step:

diff --git a/object-file.c b/object-file.c
index 26290554bb4..48eff3850f5 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1534,7 +1534,8 @@ int fetch_if_missing = 1;
 
 static int do_oid_object_info_extended(struct repository *r,
 				       const struct object_id *oid,
-				       struct object_info *oi, unsigned flags)
+				       struct object_info *oi,
+				       enum object_info_flags flags)
 {
 	static struct object_info blank_oi = OBJECT_INFO_INIT;
 	struct cached_object *co;
@@ -1633,7 +1634,7 @@ static int do_oid_object_info_extended(struct repository *r,
 }
 
 int oid_object_info_extended(struct repository *r, const struct object_id *oid,
-			     struct object_info *oi, unsigned flags)
+			     struct object_info *oi, enum object_info_flags flags)
 {
 	int ret;
 	obj_read_lock();
diff --git a/object-store.h b/object-store.h
index 1be57abaf10..a20e00395b9 100644
--- a/object-store.h
+++ b/object-store.h
@@ -428,28 +428,31 @@ struct object_info {
  */
 #define OBJECT_INFO_INIT { 0 }
 
-/* Invoke lookup_replace_object() on the given hash */
-#define OBJECT_INFO_LOOKUP_REPLACE 1
-/* Allow reading from a loose object file of unknown/bogus type */
-#define OBJECT_INFO_ALLOW_UNKNOWN_TYPE 2
-/* Do not retry packed storage after checking packed and loose storage */
-#define OBJECT_INFO_QUICK 8
-/* Do not check loose object */
-#define OBJECT_INFO_IGNORE_LOOSE 16
-/*
- * Do not attempt to fetch the object if missing (even if fetch_is_missing is
- * nonzero).
- */
-#define OBJECT_INFO_SKIP_FETCH_OBJECT 32
-/*
- * This is meant for bulk prefetching of missing blobs in a partial
- * clone. Implies OBJECT_INFO_SKIP_FETCH_OBJECT and OBJECT_INFO_QUICK
- */
-#define OBJECT_INFO_FOR_PREFETCH (OBJECT_INFO_SKIP_FETCH_OBJECT | OBJECT_INFO_QUICK)
+enum object_info_flags {
+	/* Invoke lookup_replace_object() on the given hash */
+	OBJECT_INFO_LOOKUP_REPLACE = 1<<0,
+	/* Allow reading from a loose object file of unknown/bogus type */
+	OBJECT_INFO_ALLOW_UNKNOWN_TYPE = 1<<1,
+	/* Do not retry packed storage after checking packed and loose storage */
+	OBJECT_INFO_QUICK = 1<<2,
+	/* Do not check loose object */
+	OBJECT_INFO_IGNORE_LOOSE = 1<<3,
+	/*
+	 * Do not attempt to fetch the object if missing (even if fetch_is_missing is
+	 * nonzero).
+	 */
+	OBJECT_INFO_SKIP_FETCH_OBJECT = 1<<4,
+	/*
+	 * This is meant for bulk prefetching of missing blobs in a partial
+	 * clone. Implies OBJECT_INFO_SKIP_FETCH_OBJECT and OBJECT_INFO_QUICK
+	 */
+	OBJECT_INFO_FOR_PREFETCH = (OBJECT_INFO_SKIP_FETCH_OBJECT | OBJECT_INFO_QUICK),
+};
 
 int oid_object_info_extended(struct repository *r,
 			     const struct object_id *,
-			     struct object_info *, unsigned flags);
+			     struct object_info *,
+			     enum object_info_flags flags);
 
 /*
  * Iterate over the files in the loose-object parts of the object




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux