This improves fsck error reporting, see the examples in the commit messages of 16/17 and 17/17. To get there I've lib-ified more things in object-file.c and the general object APIs, i.e. now we'll return error codes instead of calling die() in these cases. This should fix the issues noted about v9[1]. I.e.: A. Junio's right in [2] that the "type" can't be trusted after a failed read_loose_object(). I.e. if we'll fail before we can parse it out it'll be uninitialized. It's now initialized to OBJ_NONE, we can trust it if it's set to something else. B. I re-arranged much of 16/17 and 17/17 to make the diff (but not the range-diff) smaller. C. We now share a single "strbuf" across the whole fsck_loose() walk for saving away the type name, instead of allocating a new one each time. This is both a better memory usage pattern, and makes fsck_loose() itself simpler. It also allows for using much of the pre-image as-is, i.e. the whole "goto cleanup" is gone. Likewise instead of "ret" and "err" we just have the "err" variable now. D. I fixed the redundant/left-over test setup noted by Andrei Rybak[3]. 1. http://lore.kernel.org/git/cover-v9-00.17-00000000000-20210930T133300Z-avarab@xxxxxxxxx 2. https://lore.kernel.org/git/xmqqsfxlaicg.fsf@gitster.g/ 3. https://lore.kernel.org/git/78bab348-ba3a-7a27-e32e-6b75f91178db@xxxxxxxxx/ Ævar Arnfjörð Bjarmason (17): fsck tests: add test for fsck-ing an unknown type fsck tests: refactor one test to use a sub-repo fsck tests: test current hash/type mismatch behavior fsck tests: test for garbage appended to a loose object cat-file tests: move bogus_* variable declarations earlier cat-file tests: test for missing/bogus object with -t, -s and -p cat-file tests: add corrupt loose object test cat-file tests: test for current --allow-unknown-type behavior object-file.c: don't set "typep" when returning non-zero object-file.c: return -1, not "status" from unpack_loose_header() object-file.c: make parse_loose_header_extended() public object-file.c: simplify unpack_loose_short_header() object-file.c: use "enum" return type for unpack_loose_header() object-file.c: return ULHR_TOO_LONG on "header too long" object-file.c: stop dying in parse_loose_header() fsck: don't hard die on invalid object types fsck: report invalid object type-path combinations builtin/fast-export.c | 2 +- builtin/fsck.c | 44 +++++++-- builtin/index-pack.c | 2 +- builtin/mktag.c | 3 +- cache.h | 45 ++++++++- object-file.c | 176 +++++++++++++++------------------ object-store.h | 7 +- object.c | 4 +- pack-check.c | 3 +- streaming.c | 27 +++-- t/oid-info/oid | 2 + t/t1006-cat-file.sh | 223 +++++++++++++++++++++++++++++++++++++++--- t/t1450-fsck.sh | 97 ++++++++++++++---- 13 files changed, 476 insertions(+), 159 deletions(-) Range-diff against v9: 1: 520732612f7 ! 1: 00936435423 fsck tests: add test for fsck-ing an unknown type @@ t/t1450-fsck.sh: test_expect_success 'detect corrupt index file in fsck' ' + ( + cd garbage-type && + -+ empty=$(git hash-object --stdin -w -t blob </dev/null) && -+ garbage=$(git hash-object --stdin -w -t garbage --literally </dev/null) && ++ git hash-object --stdin -w -t garbage --literally </dev/null && + + cat >err.expect <<-\EOF && + fatal: invalid object type 2: af7086623fe = 2: 32a2f9cc0c9 fsck tests: refactor one test to use a sub-repo 3: 102bc4f0176 = 3: 00d661a6032 fsck tests: test current hash/type mismatch behavior 4: ff7fc09d5a1 = 4: a527e3b262c fsck tests: test for garbage appended to a loose object 5: 278df093239 = 5: 7a63d30aef3 cat-file tests: move bogus_* variable declarations earlier 6: 290bf983590 = 6: a563c7efe1c cat-file tests: test for missing/bogus object with -t, -s and -p 7: a41b2c571e5 = 7: c5affb65b7e cat-file tests: add corrupt loose object test 8: cedeb117330 = 8: 76f9888a6f7 cat-file tests: test for current --allow-unknown-type behavior 9: 6f0673d38c8 = 9: 85a91f43634 object-file.c: don't set "typep" when returning non-zero 10: 6637e8fd2ca = 10: 51eaa2e8479 object-file.c: return -1, not "status" from unpack_loose_header() 11: 51db08ebbae = 11: 5cd2ba830e9 object-file.c: make parse_loose_header_extended() public 12: dffe5581f6f = 12: 6899c6ec17a object-file.c: simplify unpack_loose_short_header() 13: eb7c949c8b7 = 13: a3bdd53d296 object-file.c: use "enum" return type for unpack_loose_header() 14: f4cc7271df7 = 14: 5a7c2855b50 object-file.c: return ULHR_TOO_LONG on "header too long" 15: 25d6ec668d4 = 15: 3ec9fee7ee9 object-file.c: stop dying in parse_loose_header() 16: 6ce0414b2b7 ! 16: 9b75ac7c8ed fsck: don't hard die on invalid object types @@ Commit message f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13) for the introduction of read_loose_object(). - Since we're now passing in a "oi.type_name" we'll have to clean up the - allocated "strbuf sb". That we're doing it right is asserted by - e.g. the "fsck notices broken commit" test added in 03818a4a94c - (split_ident: parse timestamp from end of line, 2013-10-14). To do - that switch to a "goto cleanup" pattern, and while we're at it factor - out the already duplicated free(content) to use that pattern. + Since we'll need a "struct strbuf" to hold the "type_name" let's pass + it to the for_each_loose_file_in_objdir() callback to avoid allocating + a new one for each loose object in the iteration. It also makes the + memory management simpler than sticking it in fsck_loose() itself, as + we'll only need to strbuf_reset() it, with no need to do a + strbuf_release() before each "return". + + Before this commit we'd never check the "type" if read_loose_object() + failed, but now we do. We therefore need to initialize it to OBJ_NONE + to be able to tell the difference between e.g. its + unpack_loose_header() having failed, and us getting past that and into + parse_loose_header(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> ## builtin/fsck.c ## -@@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data) +@@ builtin/fsck.c: static void get_default_heads(void) + } + } + ++struct for_each_loose_cb ++{ ++ struct progress *progress; ++ struct strbuf obj_type; ++}; ++ + static int fsck_loose(const struct object_id *oid, const char *path, void *data) + { ++ struct for_each_loose_cb *cb_data = data; + struct object *obj; +- enum object_type type; ++ enum object_type type = OBJ_NONE; unsigned long size; void *contents; int eaten; -+ struct strbuf sb = STRBUF_INIT; + struct object_info oi = OBJECT_INFO_INIT; + int err = 0; - if (read_loose_object(path, oid, &type, &size, &contents) < 0) { -+ oi.type_name = &sb; ++ strbuf_reset(&cb_data->obj_type); ++ oi.type_name = &cb_data->obj_type; + oi.sizep = &size; + oi.typep = &type; + + if (read_loose_object(path, oid, &contents, &oi) < 0) + err = error(_("%s: object corrupt or missing: %s"), + oid_to_hex(oid), path); -+ if (type < 0) ++ if (type != OBJ_NONE && type < 0) + err = error(_("%s: object is of unknown type '%s': %s"), -+ oid_to_hex(oid), sb.buf, path); -+ if (err) { ++ oid_to_hex(oid), cb_data->obj_type.buf, path); ++ if (err < 0) { errors_found |= ERROR_OBJECT; - error(_("%s: object corrupt or missing: %s"), - oid_to_hex(oid), path); -- return 0; /* keep checking other objects */ -+ goto cleanup; - } - - if (!contents && type != OBJ_BLOB) -@@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data) - errors_found |= ERROR_OBJECT; - error(_("%s: object could not be parsed: %s"), - oid_to_hex(oid), path); -- if (!eaten) -- free(contents); -- return 0; /* keep checking other objects */ -+ goto cleanup_eaten; + return 0; /* keep checking other objects */ } - obj->flags &= ~(REACHABLE | SEEN); -@@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data) - if (fsck_obj(obj, contents, size)) - errors_found |= ERROR_OBJECT; +@@ builtin/fsck.c: static int fsck_cruft(const char *basename, const char *path, void *data) + return 0; + } -+cleanup_eaten: - if (!eaten) - free(contents); -+cleanup: -+ strbuf_release(&sb); - return 0; /* keep checking other objects, even if we saw an error */ +-static int fsck_subdir(unsigned int nr, const char *path, void *progress) ++static int fsck_subdir(unsigned int nr, const char *path, void *data) + { ++ struct for_each_loose_cb *cb_data = data; ++ struct progress *progress = cb_data->progress; + display_progress(progress, nr + 1); + return 0; + } +@@ builtin/fsck.c: static int fsck_subdir(unsigned int nr, const char *path, void *progress) + static void fsck_object_dir(const char *path) + { + struct progress *progress = NULL; ++ struct for_each_loose_cb cb_data = { ++ .obj_type = STRBUF_INIT, ++ .progress = progress, ++ }; + + if (verbose) + fprintf_ln(stderr, _("Checking object directory")); +@@ builtin/fsck.c: static void fsck_object_dir(const char *path) + progress = start_progress(_("Checking object directories"), 256); + + for_each_loose_file_in_objdir(path, fsck_loose, fsck_cruft, fsck_subdir, +- progress); ++ &cb_data); + display_progress(progress, 256); + stop_progress(&progress); ++ strbuf_release(&cb_data.obj_type); } + static int fsck_head_link(const char *head_ref_name, ## object-file.c ## @@ object-file.c: static int check_stream_oid(git_zstream *stream, @@ t/t1450-fsck.sh: test_expect_success 'detect corrupt index file in fsck' ' git init --bare garbage-type && ( cd garbage-type && -@@ t/t1450-fsck.sh: test_expect_success 'fsck hard errors on an invalid object type' ' + +- git hash-object --stdin -w -t garbage --literally </dev/null && ++ garbage_blob=$(git hash-object --stdin -w -t garbage --literally </dev/null) && + + cat >err.expect <<-\EOF && fatal: invalid object type EOF test_must_fail git fsck >out 2>err && @@ t/t1450-fsck.sh: test_expect_success 'fsck hard errors on an invalid object type - test_must_be_empty out + grep -e "^error" -e "^fatal" err >errors && + test_line_count = 1 errors && -+ grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err && -+ grep "dangling blob $empty_blob" out ++ grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err ) ' 17: 8d926e41fc3 ! 17: 838df0a979b fsck: report invalid object type-path combinations @@ builtin/fast-export.c: static void export_blob(const struct object_id *oid) ## builtin/fsck.c ## @@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data) - struct object *obj; - enum object_type type; - unsigned long size; -- void *contents; -+ unsigned char *contents = NULL; + void *contents; int eaten; - struct strbuf sb = STRBUF_INIT; struct object_info oi = OBJECT_INFO_INIT; -- int err = 0; + struct object_id real_oid = *null_oid(); -+ int ret; + int err = 0; - oi.type_name = &sb; + strbuf_reset(&cb_data->obj_type); +@@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data) oi.sizep = &size; oi.typep = &type; - if (read_loose_object(path, oid, &contents, &oi) < 0) - err = error(_("%s: object corrupt or missing: %s"), - oid_to_hex(oid), path); -+ ret = read_loose_object(path, oid, &real_oid, (void **)&contents, &oi); -+ if (ret < 0) { ++ if (read_loose_object(path, oid, &real_oid, &contents, &oi) < 0) { + if (contents && !oideq(&real_oid, oid)) -+ error(_("%s: hash-path mismatch, found at: %s"), -+ oid_to_hex(&real_oid), path); ++ err = error(_("%s: hash-path mismatch, found at: %s"), ++ oid_to_hex(&real_oid), path); + else -+ error(_("%s: object corrupt or missing: %s"), -+ oid_to_hex(oid), path); ++ err = error(_("%s: object corrupt or missing: %s"), ++ oid_to_hex(oid), path); + } - if (type < 0) -- err = error(_("%s: object is of unknown type '%s': %s"), -- oid_to_hex(oid), sb.buf, path); -- if (err) { -+ ret = error(_("%s: object is of unknown type '%s': %s"), -+ oid_to_hex(&real_oid), sb.buf, path); -+ if (ret < 0) { + if (type != OBJ_NONE && type < 0) + err = error(_("%s: object is of unknown type '%s': %s"), +- oid_to_hex(oid), cb_data->obj_type.buf, path); ++ oid_to_hex(&real_oid), cb_data->obj_type.buf, ++ path); + if (err < 0) { errors_found |= ERROR_OBJECT; - goto cleanup; - } + return 0; /* keep checking other objects */ ## builtin/index-pack.c ## @@ builtin/index-pack.c: static void fix_unresolved_deltas(struct hashfile *f) -- 2.33.0.1375.g5eed55aa1b5