Re: [PATCH 4/4] midx: report checksum mismatches during 'verify'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 23, 2021 at 02:39:15PM -0400, Taylor Blau wrote:

> 'git multi-pack-index verify' inspects the data in an existing MIDX for
> correctness by checking that the recorded object offsets are correct,
> and so on.
> 
> But it does not check that the file's trailing checksum matches the data
> that it records. So, if an on-disk corruption happened to occur in the
> final few bytes (and all other data was recorded correctly), we would:
> 
>   - get a clean result from 'git multi-pack-index verify', but
>   - be unable to reuse the existing MIDX when writing a new one (since
>     we now check for checksum mismatches before reusing a MIDX)
> 
> Teach the 'verify' sub-command to recognize corruption in the checksum
> by calling midx_checksum_valid().

Makes sense. I was a little surprised we didn't do this already, but I
guess it does not do the same "regenerate and make sure hashfile
produces the same checksum" trick that the pack idx verifier does (as an
aside, I think what the midx code is doing here is much _better_,
because it is looking at semantic problems in the file, and is more
robust against irrelevant changes in the format).

> @@ -1228,6 +1228,9 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
>  		return result;
>  	}
>  
> +	if (!midx_checksum_valid(m))
> +		midx_report(_("incorrect checksum"));

This "midx_report()" function doesn't provide much context on stderr. I
get:

  $ echo foo >>.git/objects/pack/multi-pack-index
  $ git multi-pack-index verify
  incorrect checksum
  Verifying OID order in multi-pack-index: 100% (282/282), done.
  Sorting objects by packfile: 100% (283/283), done.
  Verifying object offsets: 100% (283/283), done.

I think we should at least say "error:", but something along the lines
of "midx file at %s does not match its trailing checksum (possibly
corruption?)". Or something like that.

I think all of the existing calls to midx_report() share this issue,
though. We probably want to at least say "error:" here, but maybe even
something like:

diff --git a/midx.c b/midx.c
index 9a35b0255d..e464907a7c 100644
--- a/midx.c
+++ b/midx.c
@@ -1172,10 +1172,12 @@ void clear_midx_file(struct repository *r)
 
 static int verify_midx_error;
 
-static void midx_report(const char *fmt, ...)
+static void midx_report(struct multi_pack_index *m, const char *fmt, ...)
 {
 	va_list ap;
 	verify_midx_error = 1;
+	/* do we need to care about the "next" pointer here? */
+	fprintf(stderr, ("error: %s/multi-pack-index: "), m->object_dir);
 	va_start(ap, fmt);
 	vfprintf(stderr, fmt, ap);
 	fprintf(stderr, "\n");

Also, a side note: we should use __attribute__((format)) on this
function to get compile-time checks of our format strings.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux