[PATCH v4 00/21] fsck: better "invalid object" error reporting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



A late re-roll of the v3[1] that's in "seen" and causing a couple of
CI errors, both are solved with this version. For a recap of what this
is about see v3's summary[1].

One was a "mv -v" issue on OSX, it asks interactively, and defaults to
no, now moved to "mv -f" (as in other tests that move .git/objects/*
files directly).

The other happened on one of the docker32 Linux boxes, and turned out
to be an uninitialized variable bug, which happened to be initialized
to a negative value there, and zero (or positive) everywhere else.

I also went through all the commentary on the v3 (particularly good
feedback from Jonathan Tan) and hopefully addressed everything in one
way or another. In particular the 12/21 is new here, split off from
what's now 14/21. That change was the most complex one in the previous
series.

1. https://lore.kernel.org/git/cover-00.17-0000000000-20210520T111610Z-avarab@xxxxxxxxx/

Ævar Arnfjörð Bjarmason (21):
  fsck tests: refactor one test to use a sub-repo
  fsck tests: add test for fsck-ing an unknown type
  cat-file tests: test for missing object with -t and -s
  cat-file tests: test that --allow-unknown-type isn't on by default
  rev-list tests: test for behavior with invalid object types
  cat-file tests: add corrupt loose object test
  cat-file tests: test for current --allow-unknown-type behavior
  cache.h: move object functions to object-store.h
  object-file.c: don't set "typep" when returning non-zero
  object-file.c: make parse_loose_header_extended() public
  object-file.c: add missing braces to loose_object_info()
  object-file.c: simplify unpack_loose_short_header()
  object-file.c: split up ternary in parse_loose_header()
  object-file.c: stop dying in parse_loose_header()
  object-file.c: guard against future bugs in loose_object_info()
  object-file.c: return -1, not "status" from unpack_loose_header()
  object-file.c: return -2 on "header too long" in unpack_loose_header()
  fsck: don't hard die on invalid object types
  object-store.h: move read_loose_object() below 'struct object_info'
  fsck: report invalid types recorded in objects
  fsck: report invalid object type-path combinations

 builtin/fast-export.c  |   2 +-
 builtin/fsck.c         |  28 ++++++-
 builtin/index-pack.c   |   2 +-
 builtin/mktag.c        |   3 +-
 cache.h                |  10 ---
 object-file.c          | 178 +++++++++++++++++++++--------------------
 object-store.h         |  62 +++++++++++---
 object.c               |   4 +-
 pack-check.c           |   3 +-
 streaming.c            |  10 ++-
 t/t1006-cat-file.sh    | 169 ++++++++++++++++++++++++++++++++++++++
 t/t1450-fsck.sh        |  64 +++++++++++----
 t/t6115-rev-list-du.sh |  11 +++
 13 files changed, 407 insertions(+), 139 deletions(-)

Range-diff against v3:
 1:  aa38b2bf9e7 =  1:  2e37971c016 fsck tests: refactor one test to use a sub-repo
 2:  82b64abd250 =  2:  79630a99433 fsck tests: add test for fsck-ing an unknown type
 3:  7c3c2fe25d9 =  3:  2b5366bfb9d cat-file tests: test for missing object with -t and -s
 4:  871b8200035 !  4:  ea9a5ef0920 cat-file tests: test that --allow-unknown-type isn't on by default
    @@ Metadata
      ## Commit message ##
         cat-file tests: test that --allow-unknown-type isn't on by default
     
    -    Fix a blindspot in the tests added in the tests for the
    -    --allow-unknown-type feature, added in 39e4ae38804 (cat-file: teach
    -    cat-file a '--allow-unknown-type' option, 2015-05-03).
    +    Fix a blindspot in the tests for the --allow-unknown-type feature
    +    added in 39e4ae38804 (cat-file: teach cat-file a
    +    '--allow-unknown-type' option, 2015-05-03). We should check that
    +    --allow-unknown-type isn't on by default.
     
         Before this change all the tests would succeed if --allow-unknown-type
         was on by default, let's fix that by asserting that -t and -s die on a
 5:  b98da9cc89e =  5:  8eaf0e6ddda rev-list tests: test for behavior with invalid object types
 6:  04cc1d20f62 !  6:  f0e9d92414e cat-file tests: add corrupt loose object test
    @@ t/t1006-cat-file.sh: test_expect_success "Size of large broken object is correct
     +		test_cmp out.expect out.actual &&
     +
     +		# Swap the two to corrupt the repository
    -+		mv -v "$other_path" "$empty_path" &&
    ++		mv -f "$other_path" "$empty_path" &&
     +		test_must_fail git fsck 2>err.fsck &&
     +		grep "hash mismatch" err.fsck &&
     +
    @@ t/t1006-cat-file.sh: test_expect_success "Size of large broken object is correct
     +
     +		# So far "cat-file" has been happy to spew the found
     +		# content out as-is. Try to make it zlib-invalid.
    -+		mv -v other.blob "$empty_path" &&
    ++		mv -f other.blob "$empty_path" &&
     +		test_must_fail git fsck 2>err.fsck &&
     +		grep "^error: inflate: data stream error (" err.fsck
     +	)
 7:  9217320888f =  7:  d797d2e8e9d cat-file tests: test for current --allow-unknown-type behavior
 8:  12dd4538794 =  8:  96310a0bb59 cache.h: move object functions to object-store.h
 -:  ----------- >  9:  54fb9189408 object-file.c: don't set "typep" when returning non-zero
 9:  6a5b78dcad8 = 10:  9d36fcbc44a object-file.c: make parse_loose_header_extended() public
10:  5d31d7e1a54 ! 11:  74c308adc19 object-file.c: add missing braces to loose_object_info()
    @@ object-file.c: static int loose_object_info(struct repository *r,
     +	}
      
      	munmap(map, mapsize);
    - 	if (status && oi->typep)
    + 	if (oi->sizep == &size_scratch)
11:  ee28089219f ! 12:  3f52149bfde object-file.c: stop dying in parse_loose_header()
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
     
      ## Commit message ##
    -    object-file.c: stop dying in parse_loose_header()
    +    object-file.c: simplify unpack_loose_short_header()
     
    -    Start the libification of parse_loose_header() by making it return
    -    error codes and data instead of invoking die() by itself. For now
    -    we'll move the relevant die() call to loose_object_info() and
    -    read_loose_object() to keep this change smaller, but in subsequent
    -    commits we'll also libify those.
    +    Combine the unpack_loose_short_header(),
    +    unpack_loose_header_to_strbuf() and unpack_loose_header() functions
    +    into one.
     
    -    The reason this makes sense is that with the refactoring of
    -    parse_loose_header_extended() in an earlier commit the public
    -    interface for parse_loose_header() no longer just accepts a "unsigned
    -    long *sizep". Rather it accepts a "struct object_info *", that
    -    structure will be populated with information about the object.
    -
    -    It thus makes sense to further libify the interface so that it stops
    -    calling die() when it encounters OBJ_BAD, and instead rely on its
    -    callers to check the populated "oi->typep".
    -
    -    This also allows us to simplify away the
    -    unpack_loose_header_to_strbuf() function added in
    +    The unpack_loose_header_to_strbuf() function was added in
         46f034483eb (sha1_file: support reading from a loose object of unknown
    -    type, 2015-05-03). Its code was mostly copy/pasted between it and both
    -    of unpack_loose_header() and unpack_loose_short_header(). We now have
    -    a single unpack_loose_header() function which accepts an optional
    +    type, 2015-05-03).
    +
    +    Its code was mostly copy/pasted between it and both of
    +    unpack_loose_header() and unpack_loose_short_header(). We now have a
    +    single unpack_loose_header() function which accepts an optional
         "struct strbuf *" instead.
     
         I think the remaining unpack_loose_header() function could be further
         simplified, we're carrying some complexity just to be able to emit a
         garbage type longer than MAX_HEADER_LEN, we could alternatively just
    -    say "we found a garbage type <first 32 bytes>..." instead, but let's
    -    leave this in place for now.
    +    say "we found a garbage type <first 32 bytes>..." instead. But let's
    +    leave the current behavior in place for now.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
     
    @@ object-file.c: static int unpack_loose_short_header(git_zstream *stream,
      		return 0;
      
     +	/*
    -+	 * We have a header longer than MAX_HEADER_LEN. We abort early
    -+	 * unless under we're running as e.g. "cat-file
    ++	 * We have a header longer than MAX_HEADER_LEN. The "header"
    ++	 * here is only non-NULL when we run "cat-file
     +	 * --allow-unknown-type".
     +	 */
     +	if (!header)
    @@ object-file.c: static int unpack_loose_short_header(git_zstream *stream,
      	/*
      	 * buffer[0..bufsiz] was not large enough.  Copy the partial
      	 * result out to header, and then append the result of further
    -@@ object-file.c: static void *unpack_loose_rest(git_zstream *stream,
    -  * too permissive for what we want to check. So do an anal
    -  * object header parse by hand.
    -  */
    --int parse_loose_header(const char *hdr,
    --		       struct object_info *oi,
    --		       unsigned int flags)
    -+int parse_loose_header(const char *hdr, struct object_info *oi)
    - {
    - 	const char *type_buf = hdr;
    - 	unsigned long size;
    -@@ object-file.c: int parse_loose_header(const char *hdr,
    - 	type = type_from_string_gently(type_buf, type_len, 1);
    - 	if (oi->type_name)
    - 		strbuf_add(oi->type_name, type_buf, type_len);
    --	/*
    --	 * Set type to 0 if its an unknown object and
    --	 * we're obtaining the type using '--allow-unknown-type'
    --	 * option.
    --	 */
    --	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
    --		type = 0;
    --	else if (type < 0)
    --		die(_("invalid object type"));
    - 	if (oi->typep)
    - 		*oi->typep = type;
    - 
    -@@ object-file.c: int parse_loose_header(const char *hdr,
    - 	/*
    - 	 * The length must be followed by a zero byte
    - 	 */
    --	return *hdr ? -1 : type;
    -+	if (*hdr)
    -+		return -1;
    -+
    -+	/*
    -+	 * The format is valid, but the type may still be bogus. The
    -+	 * Caller needs to check its oi->typep.
    -+	 */
    -+	return 0;
    - }
    - 
    - static int loose_object_info(struct repository *r,
     @@ object-file.c: static int loose_object_info(struct repository *r,
      	unsigned long mapsize;
      	void *map;
    @@ object-file.c: static int loose_object_info(struct repository *r,
      	char hdr[MAX_HEADER_LEN];
      	struct strbuf hdrbuf = STRBUF_INIT;
      	unsigned long size_scratch;
    -+	enum object_type type_scratch;
     +	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
      
      	if (oi->delta_base_oid)
      		oidclr(oi->delta_base_oid);
     @@ object-file.c: static int loose_object_info(struct repository *r,
      
    - 	if (!oi->sizep)
    - 		oi->sizep = &size_scratch;
    -+	if (!oi->typep)
    -+		oi->typep = &type_scratch;
    - 
      	if (oi->disk_sizep)
      		*oi->disk_sizep = mapsize;
     -	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
    @@ object-file.c: static int loose_object_info(struct repository *r,
      		status = error(_("unable to unpack %s header"),
      			       oid_to_hex(oid));
      	}
    --
    --	if (status < 0) {
    --		/* Do nothing */
    --	} else if (hdrbuf.len) {
    --		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
    --			status = error(_("unable to parse %s header with --allow-unknown-type"),
    --				       oid_to_hex(oid));
    --	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0) {
    -+	if (!status && parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi) < 0) {
    - 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
    - 	}
    -+	if (!allow_unknown && *oi->typep < 0)
    -+		die(_("invalid object type"));
    - 
    - 	if (status >= 0 && oi->contentp) {
    - 		*oi->contentp = unpack_loose_rest(&stream, hdr,
    -@@ object-file.c: static int loose_object_info(struct repository *r,
    - 		*oi->typep = status;
    - 	if (oi->sizep == &size_scratch)
    - 		oi->sizep = NULL;
    --	strbuf_release(&hdrbuf);
    -+	if (oi->typep == &type_scratch)
    -+		oi->typep = NULL;
    - 	oi->whence = OI_LOOSE;
    - 	return (status < 0) ? status : 0;
    - }
    -@@ object-file.c: int read_loose_object(const char *path,
    - 	git_zstream stream;
    - 	char hdr[MAX_HEADER_LEN];
    - 	struct object_info oi = OBJECT_INFO_INIT;
    -+	oi.typep = type;
    - 	oi.sizep = size;
    - 
    - 	*contents = NULL;
     @@ object-file.c: int read_loose_object(const char *path,
      		goto out;
      	}
    @@ object-file.c: int read_loose_object(const char *path,
      		error(_("unable to unpack header of %s"), path);
      		goto out;
      	}
    - 
    --	*type = parse_loose_header(hdr, &oi, 0);
    --	if (*type < 0) {
    -+	if (parse_loose_header(hdr, &oi) < 0) {
    - 		error(_("unable to parse header of %s"), path);
    - 		git_inflate_end(&stream);
    - 		goto out;
    - 	}
    -+	if (*type < 0)
    -+		die(_("invalid object type"));
    - 
    - 	if (*type == OBJ_BLOB && *size > big_file_threshold) {
    - 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
     
      ## object-store.h ##
     @@ object-store.h: int for_each_object_in_pack(struct packed_git *p,
    @@ object-store.h: int for_each_object_in_pack(struct packed_git *p,
      int unpack_loose_header(git_zstream *stream, unsigned char *map,
      			unsigned long mapsize, void *buffer,
     -			unsigned long bufsiz);
    --int parse_loose_header(const char *hdr, struct object_info *oi,
    --		       unsigned int flags);
     +			unsigned long bufsiz, struct strbuf *hdrbuf);
    -+
    -+/**
    -+ * parse_loose_header() parses the starting "<type> <len>\0" of an
    -+ * object. If it doesn't follow that format -1 is returned. To check
    -+ * the validity of the <type> populate the "typep" in the "struct
    -+ * object_info". It will be OBJ_BAD if the object type is unknown.
    -+ */
    -+int parse_loose_header(const char *hdr, struct object_info *oi);
    -+
    + int parse_loose_header(const char *hdr, struct object_info *oi,
    + 		       unsigned int flags);
      int check_object_signature(struct repository *r, const struct object_id *oid,
    - 			   void *buf, unsigned long size, const char *type);
    - int finalize_object_file(const char *tmpfile, const char *filename);
     
      ## streaming.c ##
    -@@ streaming.c: static int open_istream_loose(struct git_istream *st, struct repository *r,
    - {
    - 	struct object_info oi = OBJECT_INFO_INIT;
    - 	oi.sizep = &st->size;
    -+	oi.typep = type;
    - 
    - 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
    - 	if (!st->u.loose.mapped)
     @@ streaming.c: static int open_istream_loose(struct git_istream *st, struct repository *r,
      				 st->u.loose.mapped,
      				 st->u.loose.mapsize,
      				 st->u.loose.hdr,
     -				 sizeof(st->u.loose.hdr)) < 0) ||
    --	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
     +				 sizeof(st->u.loose.hdr),
     +				 NULL) < 0) ||
    -+	    (parse_loose_header(st->u.loose.hdr, &oi) < 0) ||
    -+	    *type < 0) {
    + 	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
      		git_inflate_end(&st->z);
      		munmap(st->u.loose.mapped, st->u.loose.mapsize);
    - 		return -1;
 -:  ----------- > 13:  ba632be1520 object-file.c: split up ternary in parse_loose_header()
 -:  ----------- > 14:  ea4f446f5b1 object-file.c: stop dying in parse_loose_header()
 -:  ----------- > 15:  aacef784eab object-file.c: guard against future bugs in loose_object_info()
13:  d22d5b8b85e = 16:  050cfc7808c object-file.c: return -1, not "status" from unpack_loose_header()
12:  77f2cd439c6 ! 17:  78e3152fd94 object-file.c: return -2 on "header too long" in unpack_loose_header()
    @@ Commit message
         MAX_HEADER_LEN limit, or other negative values for "unable to unpack
         <OID> header".
     
    +    I tried setting up an enum just for these three return values, but I
    +    think the result was less readable. Let's consider doing that if we
    +    gain even more return values. For now let's do the next best thing and
    +    enumerate our known return values, and BUG() if we encounter one we
    +    don't know about.
    +
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
     
      ## object-file.c ##
     @@ object-file.c: int unpack_loose_header(git_zstream *stream,
    - 	/*
    - 	 * We have a header longer than MAX_HEADER_LEN. We abort early
    - 	 * unless under we're running as e.g. "cat-file
    --	 * --allow-unknown-type".
    -+	 * --allow-unknown-type". A -2 is "header too long"
    + 	 * --allow-unknown-type".
      	 */
      	if (!header)
     -		return -1;
    @@ object-file.c: int unpack_loose_header(git_zstream *stream,
      
      static void *unpack_loose_rest(git_zstream *stream,
     @@ object-file.c: static int loose_object_info(struct repository *r,
    + 
      	hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
      				      allow_unknown ? &hdrbuf : NULL);
    - 	if (hdr_ret < 0) {
    --		status = error(_("unable to unpack %s header"),
    --			       oid_to_hex(oid));
    -+		if (hdr_ret == -2)
    -+			status = error(_("header for %s too long, exceeds %d bytes"),
    -+				       oid_to_hex(oid), MAX_HEADER_LEN);
    -+		else
    -+			status = error(_("unable to unpack %s header"),
    -+				       oid_to_hex(oid));
    - 	}
    -+
    - 	if (!status && parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi) < 0) {
    - 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
    +-	if (hdr_ret < 0) {
    ++	switch (hdr_ret) {
    ++	case 0:
    ++		break;
    ++	case -1:
    + 		status = error(_("unable to unpack %s header"),
    + 			       oid_to_hex(oid));
    ++		break;
    ++	case -2:
    ++		status = error(_("header for %s too long, exceeds %d bytes"),
    ++			       oid_to_hex(oid), MAX_HEADER_LEN);
    ++		break;
    ++	default:
    ++		BUG("unknown hdr_ret value %d", hdr_ret);
      	}
    + 	if (!status) {
    + 		if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
     
      ## object-store.h ##
     @@ object-store.h: int for_each_packed_object(each_packed_object_fn, void *,
14:  260e9888a3e = 18:  f9bb1b799ac fsck: don't hard die on invalid object types
15:  e2afb813b28 = 19:  acbea7e2a2a object-store.h: move read_loose_object() below 'struct object_info'
16:  328f05c51b3 = 20:  edc28de229d fsck: report invalid types recorded in objects
17:  c5e6686765d ! 21:  e588c05f461 fsck: report invalid object type-path combinations
    @@ pack-check.c: static int verify_packfile(struct repository *r,
      ## t/t1006-cat-file.sh ##
     @@ t/t1006-cat-file.sh: test_expect_success 'cat-file -t and -s on corrupt loose object' '
      		# Swap the two to corrupt the repository
    - 		mv -v "$other_path" "$empty_path" &&
    + 		mv -f "$other_path" "$empty_path" &&
      		test_must_fail git fsck 2>err.fsck &&
     -		grep "hash mismatch" err.fsck &&
     +		grep "hash-path mismatch" err.fsck &&
-- 
2.32.0.606.g2e440ee2c94




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux