[PATCH v3 00/17] fsck: better "invalid object" error reporting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



A re-roll of v2's 6 patch series[1], has turned into 17 now. Less
scary than one might think though, it's mostly added test coverage +
splitting existing commits into more incremental chunks, but see the
range-diff. This should address all the feedback on v2 + more.

A brief recap summary of what this is about: We now gracefully recover
instead of dying when fsck encounters types that aren't
blob/commit/tree/tag. Those types don't exist in the wild, but you can
manually create them with "git hash-object -t garbage --literally".

So in some senes this matters to nobody, but I'm doing this as part of
general changes I've been pushing to make fsck/gc error reporting more
graceful, and errors more recoverable. We now have a few more places
in object-file.c where we don't just die(), but properly return
API-like return codes/data to the caller instead.

This does not contain any changes to how --allow-unknown-type
hash-object's --literally etc. work, as I suggested we could do in
[2]. Any such changes will need the API changes here, but these are
just the narrow fsck fixes.

1. https://lore.kernel.org/git/cover-0.6-00000000000-20210413T093734Z-avarab@xxxxxxxxx
2. https://lore.kernel.org/git/87r1i4qf4h.fsf@xxxxxxxxxxxxxxxxxxx/

Ævar Arnfjörð Bjarmason (17):
  fsck tests: refactor one test to use a sub-repo
  fsck tests: add test for fsck-ing an unknown type
  cat-file tests: test for missing object with -t and -s
  cat-file tests: test that --allow-unknown-type isn't on by default
  rev-list tests: test for behavior with invalid object types
  cat-file tests: add corrupt loose object test
  cat-file tests: test for current --allow-unknown-type behavior
  cache.h: move object functions to object-store.h
  object-file.c: make parse_loose_header_extended() public
  object-file.c: add missing braces to loose_object_info()
  object-file.c: stop dying in parse_loose_header()
  object-file.c: return -2 on "header too long" in unpack_loose_header()
  object-file.c: return -1, not "status" from unpack_loose_header()
  fsck: don't hard die on invalid object types
  object-store.h: move read_loose_object() below 'struct object_info'
  fsck: report invalid types recorded in objects
  fsck: report invalid object type-path combinations

 builtin/fast-export.c  |   2 +-
 builtin/fsck.c         |  28 ++++++-
 builtin/index-pack.c   |   2 +-
 builtin/mktag.c        |   3 +-
 cache.h                |  10 ---
 object-file.c          | 156 ++++++++++++++++++-------------------
 object-store.h         |  60 +++++++++++----
 object.c               |   4 +-
 pack-check.c           |   3 +-
 streaming.c            |  10 ++-
 t/t1006-cat-file.sh    | 169 +++++++++++++++++++++++++++++++++++++++++
 t/t1450-fsck.sh        |  64 +++++++++++-----
 t/t6115-rev-list-du.sh |  11 +++
 13 files changed, 387 insertions(+), 135 deletions(-)

Range-diff against v2:
 2:  5a2cd6cca9 =  1:  aa38b2bf9e fsck tests: refactor one test to use a sub-repo
 -:  ---------- >  2:  82b64abd25 fsck tests: add test for fsck-ing an unknown type
 -:  ---------- >  3:  7c3c2fe25d cat-file tests: test for missing object with -t and -s
 -:  ---------- >  4:  871b820003 cat-file tests: test that --allow-unknown-type isn't on by default
 -:  ---------- >  5:  b98da9cc89 rev-list tests: test for behavior with invalid object types
 -:  ---------- >  6:  04cc1d20f6 cat-file tests: add corrupt loose object test
 -:  ---------- >  7:  9217320888 cat-file tests: test for current --allow-unknown-type behavior
 1:  37c323a241 =  8:  12dd453879 cache.h: move object functions to object-store.h
 3:  d0d9cb3331 !  9:  6a5b78dcad fsck: don't hard die on invalid object types
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
     
      ## Commit message ##
    -    fsck: don't hard die on invalid object types
    +    object-file.c: make parse_loose_header_extended() public
     
    -    Change builtin/fsck.c to pass down a
    -    OBJECT_INFO_ALLOW_UNKNOWN_TYPE. This changes this very ungraceful
    -    error:
    +    Make the parse_loose_header_extended() function public and remove the
    +    parse_loose_header() wrapper. The only direct user of it outside of
    +    object-file.c itself was in streaming.c, that caller can simply pass
    +    the required "struct object-info *" instead.
     
    -        $ git hash-object --stdin -w -t garbage --literally </dev/null
    -        <OID>
    -        $ git fsck
    -        fatal: invalid object type
    -        $
    -
    -    Into:
    -
    -        $ git fsck
    -        error: hash mismatch for <OID_PATH> (expected <OID>)
    -        error: <OID>: object corrupt or missing: <OID_PATH>
    -        [ the rest of the fsck output here, i.e. it didn't hard die ]
    -
    -    We'll still exit with non-zero, but now we'll finish the rest of the
    -    traversal. The tests that's being added here asserts that we'll still
    -    complain about other fsck issues (e.g. an unrelated dangling blob).
    -
    -    But why are we complaining about a "hash mismatch" for an object of a
    -    type we don't know about? We shouldn't. This is the bare minimal
    -    change needed to not make fsck hard die on a repository that's been
    -    corrupted in this manner. In subsequent commits we'll teach fsck to
    -    recognize this particular type of corruption and emit a better error
    -    message.
    -
    -    The parse_loose_header() function being changed here is only used in
    -    builtin/fsck.c, see f6371f92104 (sha1_file: add read_loose_object()
    -    function, 2017-01-13) for its introduction.
    +    This change is being done in preparation for teaching
    +    read_loose_object() to accept a flag to pass to
    +    parse_loose_header(). It isn't strictly necessary for that change, we
    +    could simply use parse_loose_header_extended() there, but will leave
    +    the API in a better end state.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
     
    - ## builtin/fsck.c ##
    -@@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data)
    - 	void *contents;
    - 	int eaten;
    - 
    --	if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
    -+	if (read_loose_object(path, oid, &type, &size, &contents,
    -+			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
    - 		errors_found |= ERROR_OBJECT;
    - 		error(_("%s: object corrupt or missing: %s"),
    - 		      oid_to_hex(oid), path);
    -
      ## object-file.c ##
     @@ object-file.c: static void *unpack_loose_rest(git_zstream *stream,
       * too permissive for what we want to check. So do an anal
    @@ object-file.c: static int loose_object_info(struct repository *r,
      
      	if (status >= 0 && oi->contentp) {
     @@ object-file.c: int read_loose_object(const char *path,
    - 		      const struct object_id *expected_oid,
    - 		      enum object_type *type,
    - 		      unsigned long *size,
    --		      void **contents)
    -+		      void **contents,
    -+		      unsigned int oi_flags)
    - {
    - 	int ret = -1;
    - 	void *map = NULL;
      	unsigned long mapsize;
      	git_zstream stream;
      	char hdr[MAX_HEADER_LEN];
    @@ object-file.c: int read_loose_object(const char *path,
      	}
      
     -	*type = parse_loose_header(hdr, size);
    -+	*type = parse_loose_header(hdr, &oi, oi_flags);
    ++	*type = parse_loose_header(hdr, &oi, 0);
      	if (*type < 0) {
      		error(_("unable to parse header of %s"), path);
      		git_inflate_end(&stream);
     
      ## object-store.h ##
    -@@ object-store.h: int read_loose_object(const char *path,
    - 		      const struct object_id *expected_oid,
    - 		      enum object_type *type,
    - 		      unsigned long *size,
    --		      void **contents);
    -+		      void **contents,
    -+		      unsigned int oi_flags);
    - 
    - /* Retry packed storage after checking packed and loose storage */
    - #define HAS_OBJECT_RECHECK_PACKED 1
     @@ object-store.h: int for_each_packed_object(each_packed_object_fn, void *,
      int unpack_loose_header(git_zstream *stream, unsigned char *map,
      			unsigned long mapsize, void *buffer,
    @@ object-store.h: int for_each_packed_object(each_packed_object_fn, void *,
      int finalize_object_file(const char *tmpfile, const char *filename);
     
      ## streaming.c ##
    -@@ streaming.c: static struct stream_vtbl loose_vtbl = {
    - 
    - static open_method_decl(loose)
    +@@ streaming.c: static int open_istream_loose(struct git_istream *st, struct repository *r,
    + 			      const struct object_id *oid,
    + 			      enum object_type *type)
      {
    -+	struct object_info oi2 = OBJECT_INFO_INIT;
    -+	oi2.sizep = &st->size;
    ++	struct object_info oi = OBJECT_INFO_INIT;
    ++	oi.sizep = &st->size;
     +
      	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
      	if (!st->u.loose.mapped)
      		return -1;
    -@@ streaming.c: static open_method_decl(loose)
    +@@ streaming.c: static int open_istream_loose(struct git_istream *st, struct repository *r,
      				 st->u.loose.mapsize,
      				 st->u.loose.hdr,
      				 sizeof(st->u.loose.hdr)) < 0) ||
     -	    (parse_loose_header(st->u.loose.hdr, &st->size) < 0)) {
    -+	    (parse_loose_header(st->u.loose.hdr, &oi2, 0) < 0)) {
    ++	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
      		git_inflate_end(&st->z);
      		munmap(st->u.loose.mapped, st->u.loose.mapsize);
      		return -1;
    -
    - ## t/t1450-fsck.sh ##
    -@@ t/t1450-fsck.sh: test_expect_success 'detect corrupt index file in fsck' '
    - 	test_i18ngrep "bad index file" errors
    - '
    - 
    -+test_expect_success 'fsck error and recovery on invalid object type' '
    -+	test_create_repo garbage-type &&
    -+	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
    -+	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
    -+	test_must_fail git -C garbage-type fsck >out 2>err &&
    -+	grep "$garbage_blob: object corrupt or missing:" err &&
    -+	grep "dangling blob $empty_blob" out
    -+'
    -+
    - test_done
 -:  ---------- > 10:  5d31d7e1a5 object-file.c: add missing braces to loose_object_info()
 -:  ---------- > 11:  ee28089219 object-file.c: stop dying in parse_loose_header()
 -:  ---------- > 12:  77f2cd439c object-file.c: return -2 on "header too long" in unpack_loose_header()
 -:  ---------- > 13:  d22d5b8b85 object-file.c: return -1, not "status" from unpack_loose_header()
 -:  ---------- > 14:  260e9888a3 fsck: don't hard die on invalid object types
 4:  81fffefcf9 ! 15:  e2afb813b2 object-store.h: move read_loose_object() below 'struct object_info'
    @@ Metadata
      ## Commit message ##
         object-store.h: move read_loose_object() below 'struct object_info'
     
    -    Move the definition of read_loose_object() below "struct
    +    Move the declaration of read_loose_object() below "struct
         object_info". In the next commit we'll add a "struct object_info *"
         parameter to it, moving it will avoid a forward declaration of the
         struct.
 5:  5fb6ac4fae ! 16:  328f05c51b fsck: report invalid types recorded in objects
    @@ Commit message
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
     
    - ## builtin/cat-file.c ##
    -@@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
    - 	struct strbuf sb = STRBUF_INIT;
    - 	unsigned flags = OBJECT_INFO_LOOKUP_REPLACE;
    - 	const char *path = force_path;
    -+	int ret;
    - 
    - 	if (unknown_type)
    - 		flags |= OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
    -@@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
    - 	switch (opt) {
    - 	case 't':
    - 		oi.type_name = &sb;
    --		if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0)
    -+		ret = oid_object_info_extended(the_repository, &oid, &oi, flags);
    -+		if (!unknown_type && ret < 0)
    - 			die("git cat-file: could not get object info");
    - 		if (sb.len) {
    - 			printf("%s\n", sb.buf);
    -@@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
    - 
    - 	case 's':
    - 		oi.sizep = &size;
    --		if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0)
    -+		ret = oid_object_info_extended(the_repository, &oid, &oi, flags);
    -+		if (!unknown_type && ret < 0)
    - 			die("git cat-file: could not get object info");
    - 		printf("%"PRIuMAX"\n", (uintmax_t)size);
    - 		return 0;
    -
      ## builtin/fsck.c ##
     @@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data)
      	unsigned long size;
    @@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *p
      
     
      ## object-file.c ##
    -@@ object-file.c: int parse_loose_header(const char *hdr,
    - 	 * we're obtaining the type using '--allow-unknown-type'
    - 	 * option.
    - 	 */
    --	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
    --		type = 0;
    --	else if (type < 0)
    -+	if (type < 0 && !(flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE))
    - 		die(_("invalid object type"));
    - 	if (oi->typep)
    - 		*oi->typep = type;
    -@@ object-file.c: static int loose_object_info(struct repository *r,
    - 	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
    - 		status = error(_("unable to unpack %s header"),
    - 			       oid_to_hex(oid));
    --	if (status < 0)
    -+	if (status < 0) {
    - 		; /* Do nothing */
    --	else if (hdrbuf.len) {
    -+	} else if (hdrbuf.len) {
    - 		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
    - 			status = error(_("unable to parse %s header with --allow-unknown-type"),
    - 				       oid_to_hex(oid));
    --	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
    --		status = error(_("unable to parse %s header"), oid_to_hex(oid));
    -+	} else {
    -+		status = parse_loose_header(hdr, oi, flags);
    -+		if (status < 0 && !(flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE))
    -+			error(_("unable to parse %s header"), oid_to_hex(oid));
    -+	}
    - 
    - 	if (status >= 0 && oi->contentp) {
    - 		*oi->contentp = unpack_loose_rest(&stream, hdr,
     @@ object-file.c: static int check_stream_oid(git_zstream *stream,
      
      int read_loose_object(const char *path,
    @@ object-file.c: int read_loose_object(const char *path,
      	git_zstream stream;
      	char hdr[MAX_HEADER_LEN];
     -	struct object_info oi = OBJECT_INFO_INIT;
    + 	int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
    +-	oi.typep = type;
     -	oi.sizep = size;
     +	enum object_type *type = oi->typep;
     +	unsigned long *size = oi->sizep;
    @@ object-file.c: int read_loose_object(const char *path,
      		goto out;
      	}
      
    --	*type = parse_loose_header(hdr, &oi, oi_flags);
    --	if (*type < 0) {
    --		error(_("unable to parse header of %s"), path);
    -+	*type = parse_loose_header(hdr, oi, oi_flags);
    -+	if (*type < 0 && !(oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
    -+		error(_("unable to parse header %s"), path);
    +-	if (parse_loose_header(hdr, &oi) < 0) {
    ++	if (parse_loose_header(hdr, oi) < 0) {
    + 		error(_("unable to parse header of %s"), path);
      		git_inflate_end(&stream);
      		goto out;
    - 	}
     @@ object-file.c: int read_loose_object(const char *path,
      			goto out;
      		}
    @@ t/t1450-fsck.sh: test_expect_success 'object with hash mismatch' '
      	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
      	test_when_finished "git update-ref -d refs/heads/invalid" &&
     @@ t/t1450-fsck.sh: test_expect_success 'fsck error and recovery on invalid object type' '
    - 	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
      	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
      	test_must_fail git -C garbage-type fsck >out 2>err &&
    + 	grep -e "^error" -e "^fatal" err >errors &&
    +-	test_line_count = 2 errors &&
    +-	grep "error: hash mismatch for" err &&
     -	grep "$garbage_blob: object corrupt or missing:" err &&
    ++	test_line_count = 1 errors &&
     +	grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
    -+	grep error: err >err.errors &&
    -+	test_line_count = 1 err.errors &&
      	grep "dangling blob $empty_blob" out
      '
      
 6:  226d2031bc ! 17:  c5e6686765 fsck: report invalid object type-path combinations
    @@ Metadata
      ## Commit message ##
         fsck: report invalid object type-path combinations
     
    -    fsck: improve error on loose object hash mismatch
    -
         Improve the error that's emitted in cases where we find a loose object
         we parse, but which isn't at the location we expect it to be.
     
    @@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *p
      	struct strbuf sb = STRBUF_INIT;
      	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
      	struct object_info oi;
    -+	struct object_id real_oid = null_oid;
    ++	struct object_id real_oid = *null_oid();
      	int found = 0;
      	oi.type_name = &sb;
      	oi.sizep = &size;
    @@ object-file.c: int check_object_signature(struct repository *r, const struct obj
      			break;
      		r->hash_algo->update_fn(&c, buf, readlen);
      	}
    --	r->hash_algo->final_fn(real_oid.hash, &c);
    -+	r->hash_algo->final_fn(real_oid->hash, &c);
    +-	r->hash_algo->final_oid_fn(&real_oid, &c);
    ++	r->hash_algo->final_oid_fn(real_oid, &c);
      	close_istream(st);
     -	return !oideq(oid, &real_oid) ? -1 : 0;
     +	return !oideq(oid, real_oid) ? -1 : 0;
    @@ object-file.c: int read_loose_object(const char *path,
     -			error(_("hash mismatch for %s (expected %s)"), path,
     -			      oid_to_hex(expected_oid));
     +					   *contents, *size, oi->type_name->buf, real_oid)) {
    -+			if (oideq(real_oid, &null_oid))
    ++			if (oideq(real_oid, null_oid()))
     +				BUG("should only get OID mismatch errors with mapped contents");
      			free(*contents);
      			goto out;
    @@ object-store.h: int oid_object_info_extended(struct repository *r,
      		      struct object_info *oi,
      		      unsigned int oi_flags);
     @@ object-store.h: int unpack_loose_header(git_zstream *stream, unsigned char *map,
    - int parse_loose_header(const char *hdr, struct object_info *oi,
    - 		       unsigned int flags);
    + int parse_loose_header(const char *hdr, struct object_info *oi);
    + 
      int check_object_signature(struct repository *r, const struct object_id *oid,
     -			   void *buf, unsigned long size, const char *type);
     +			   void *buf, unsigned long size, const char *type,
    @@ pack-check.c: static int verify_packfile(struct repository *r,
      				    oid_to_hex(&oid), p->pack_name);
      		else if (fn) {
     
    + ## t/t1006-cat-file.sh ##
    +@@ t/t1006-cat-file.sh: test_expect_success 'cat-file -t and -s on corrupt loose object' '
    + 		# Swap the two to corrupt the repository
    + 		mv -v "$other_path" "$empty_path" &&
    + 		test_must_fail git fsck 2>err.fsck &&
    +-		grep "hash mismatch" err.fsck &&
    ++		grep "hash-path mismatch" err.fsck &&
    + 
    + 		# confirm that cat-file is reading the new swapped-in
    + 		# blob...
    +
      ## t/t1450-fsck.sh ##
     @@ t/t1450-fsck.sh: test_expect_success 'object with hash mismatch' '
      	(
-- 
2.32.0.rc0.406.g73369325f8d




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux