On Wed, Jan 18, 2023 at 03:44:12PM -0500, Jeff King wrote: > This is obviously going to be a user-visible behavior change, and the > test changes earlier in this series show the scope of the impact. But > I'd argue that this is OK: > > - the documentation for hash-object is already vague about which > checks we might do, saying that --literally will allow "any > garbage[...] which might not otherwise pass standard object parsing > or git-fsck checks". So we are already covered under the documented > behavior. > > - users don't generally run hash-object anyway. There are a lot of > spots in the tests that needed to be updated because creating > garbage objects is something that Git's tests disproportionately do. > > - it's hard to imagine anyone thinking the new behavior is worse. Any > object we reject would be a potential problem down the road for the > user. And if they really want to create garbage, --literally is > already the escape hatch they need. This is the discussion I was pointing out earlier in the series as evidence for making this behavior the new default without "--literally". That being said, let me play devil's advocate for a second. Do the new fsck checks slow anything in hash-object down significantly? If so, then it's plausible to imagine a hash-object caller who (a) doesn't use `--literally`, but (b) does care about throughput if they're writing a large number of objects at once. I don't know if such a situation exists, or if these new fsck checks even slow hash-object down enough to care. But I didn't catch a discussion of this case in your series, so I figured I'd bring it up here just in case. > - the resulting messages are much better. For example: > > [before] > $ echo 'tree 123' | git hash-object -t commit --stdin > error: bogus commit object 0000000000000000000000000000000000000000 > fatal: corrupt commit > > [after] > $ echo 'tree 123' | git.compile hash-object -t commit --stdin > error: object fails fsck: badTreeSha1: invalid 'tree' line format - bad sha1 > fatal: refusing to create malformed object Much nicer, well done. > Signed-off-by: Jeff King <peff@xxxxxxxx> > --- > object-file.c | 55 ++++++++++++++++++------------------------ > t/t1007-hash-object.sh | 11 +++++++++ > 2 files changed, 34 insertions(+), 32 deletions(-) > > diff --git a/object-file.c b/object-file.c > index 80a0cd3b35..5c96384803 100644 > --- a/object-file.c > +++ b/object-file.c > @@ -33,6 +33,7 @@ > #include "object-store.h" > #include "promisor-remote.h" > #include "submodule.h" > +#include "fsck.h" > > /* The maximum size for an object header. */ > #define MAX_HEADER_LEN 32 > @@ -2298,32 +2299,21 @@ int repo_has_object_file(struct repository *r, > return repo_has_object_file_with_flags(r, oid, 0); > } > > -static void check_tree(const void *buf, size_t size) > -{ > - struct tree_desc desc; > - struct name_entry entry; > - > - init_tree_desc(&desc, buf, size); > - while (tree_entry(&desc, &entry)) > - /* do nothing > - * tree_entry() will die() on malformed entries */ > - ; > -} > - > -static void check_commit(const void *buf, size_t size) > -{ > - struct commit c; > - memset(&c, 0, sizeof(c)); > - if (parse_commit_buffer(the_repository, &c, buf, size, 0)) > - die(_("corrupt commit")); > -} > - > -static void check_tag(const void *buf, size_t size) > -{ > - struct tag t; > - memset(&t, 0, sizeof(t)); > - if (parse_tag_buffer(the_repository, &t, buf, size)) > - die(_("corrupt tag")); OK, here we're getting rid of all of the lightweight checks that hash-object used to implement on its own. > +/* > + * We can't use the normal fsck_error_function() for index_mem(), > + * because we don't yet have a valid oid for it to report. Instead, > + * report the minimal fsck error here, and rely on the caller to > + * give more context. > + */ > +static int hash_format_check_report(struct fsck_options *opts, > + const struct object_id *oid, > + enum object_type object_type, > + enum fsck_msg_type msg_type, > + enum fsck_msg_id msg_id, > + const char *message) > +{ > + error(_("object fails fsck: %s"), message); > + return 1; > } > > static int index_mem(struct index_state *istate, > @@ -2350,12 +2340,13 @@ static int index_mem(struct index_state *istate, > } > } > if (flags & HASH_FORMAT_CHECK) { > - if (type == OBJ_TREE) > - check_tree(buf, size); > - if (type == OBJ_COMMIT) > - check_commit(buf, size); > - if (type == OBJ_TAG) > - check_tag(buf, size); > + struct fsck_options opts = FSCK_OPTIONS_DEFAULT; > + > + opts.strict = 1; > + opts.error_func = hash_format_check_report; > + if (fsck_buffer(null_oid(), type, buf, size, &opts)) > + die(_("refusing to create malformed object")); > + fsck_finish(&opts); > } And here's the main part of the change, which is delightfully simple and appears correct to me. > diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh > index 2d2148d8fa..ac3d173767 100755 > --- a/t/t1007-hash-object.sh > +++ b/t/t1007-hash-object.sh > @@ -222,6 +222,17 @@ test_expect_success 'empty filename in tree' ' > grep "empty filename in tree entry" err > ' > > +test_expect_success 'duplicate filename in tree' ' > + hex_oid=$(echo foo | git hash-object --stdin -w) && > + bin_oid=$(echo $hex_oid | hex2oct) && > + { > + printf "100644 file\0$bin_oid" && > + printf "100644 file\0$bin_oid" > + } >tree-with-duplicate-filename && > + test_must_fail git hash-object -t tree tree-with-duplicate-filename 2>err && > + grep "duplicateEntries" err > +' > + For what it's worth, I think that this is sufficient coverage for the new fsck checks. Thanks, Taylor