On Wed, Jun 23, 2021 at 02:39:15PM -0400, Taylor Blau wrote: > 'git multi-pack-index verify' inspects the data in an existing MIDX for > correctness by checking that the recorded object offsets are correct, > and so on. > > But it does not check that the file's trailing checksum matches the data > that it records. So, if an on-disk corruption happened to occur in the > final few bytes (and all other data was recorded correctly), we would: > > - get a clean result from 'git multi-pack-index verify', but > - be unable to reuse the existing MIDX when writing a new one (since > we now check for checksum mismatches before reusing a MIDX) > > Teach the 'verify' sub-command to recognize corruption in the checksum > by calling midx_checksum_valid(). Makes sense. I was a little surprised we didn't do this already, but I guess it does not do the same "regenerate and make sure hashfile produces the same checksum" trick that the pack idx verifier does (as an aside, I think what the midx code is doing here is much _better_, because it is looking at semantic problems in the file, and is more robust against irrelevant changes in the format). > @@ -1228,6 +1228,9 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag > return result; > } > > + if (!midx_checksum_valid(m)) > + midx_report(_("incorrect checksum")); This "midx_report()" function doesn't provide much context on stderr. I get: $ echo foo >>.git/objects/pack/multi-pack-index $ git multi-pack-index verify incorrect checksum Verifying OID order in multi-pack-index: 100% (282/282), done. Sorting objects by packfile: 100% (283/283), done. Verifying object offsets: 100% (283/283), done. I think we should at least say "error:", but something along the lines of "midx file at %s does not match its trailing checksum (possibly corruption?)". Or something like that. I think all of the existing calls to midx_report() share this issue, though. We probably want to at least say "error:" here, but maybe even something like: diff --git a/midx.c b/midx.c index 9a35b0255d..e464907a7c 100644 --- a/midx.c +++ b/midx.c @@ -1172,10 +1172,12 @@ void clear_midx_file(struct repository *r) static int verify_midx_error; -static void midx_report(const char *fmt, ...) +static void midx_report(struct multi_pack_index *m, const char *fmt, ...) { va_list ap; verify_midx_error = 1; + /* do we need to care about the "next" pointer here? */ + fprintf(stderr, ("error: %s/multi-pack-index: "), m->object_dir); va_start(ap, fmt); vfprintf(stderr, fmt, ap); fprintf(stderr, "\n"); Also, a side note: we should use __attribute__((format)) on this function to get compile-time checks of our format strings. -Peff