On Mon, Apr 17, 2023 at 04:21:38PM +0000, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <derrickstolee@xxxxxxxxxx> > > The 'fsck' builtin checks many of Git's on-disk data structures, but > does not currently validate the pack rev-index files (a .rev file to > pair with a .pack and .idx file). > > Before doing a more-involved check process, create the scaffolding > within builtin/fsck.c to have a new error type and add that error type > when the API method verify_pack_revindex() returns an error. That method > does nothing currently, but we will add checks to it in later changes. > > For now, check that 'git fsck' succeeds without any errors in the normal > case. Future checks will be paired with tests that corrupt the .rev file > appropriately. > > Signed-off-by: Derrick Stolee <derrickstolee@xxxxxxxxxx> > --- > builtin/fsck.c | 30 ++++++++++++++++++++++++++++++ > pack-revindex.c | 11 +++++++++++ > pack-revindex.h | 8 ++++++++ > t/t5325-reverse-index.sh | 14 ++++++++++++++ > 4 files changed, 63 insertions(+) > > diff --git a/builtin/fsck.c b/builtin/fsck.c > index 095b39d3980..2ab78129bde 100644 > --- a/builtin/fsck.c > +++ b/builtin/fsck.c > @@ -24,6 +24,7 @@ > #include "resolve-undo.h" > #include "run-command.h" > #include "worktree.h" > +#include "pack-revindex.h" > > #define REACHABLE 0x0001 > #define SEEN 0x0002 > @@ -53,6 +54,7 @@ static int name_objects; > #define ERROR_REFS 010 > #define ERROR_COMMIT_GRAPH 020 > #define ERROR_MULTI_PACK_INDEX 040 > +#define ERROR_PACK_REV_INDEX 0100 > > static const char *describe_object(const struct object_id *oid) > { > @@ -856,6 +858,32 @@ static int mark_packed_for_connectivity(const struct object_id *oid, > return 0; > } > > +static int check_pack_rev_indexes(struct repository *r, int show_progress) > +{ > + struct progress *progress = NULL; > + uint32_t pack_count = 0; > + int res = 0; > + > + if (show_progress) { > + for (struct packed_git *p = get_all_packs(the_repository); p; p = p->next) It's going to take me a while to get used to these declarations inside of for-loops! > + pack_count++; > + progress = start_delayed_progress("Verifying reverse pack-indexes", pack_count); I wonder if we want to count over the sum of objects in packs rather than the number of packs themselves. My worry would be that a rather large pack would make it appear as if nothing is happening when in reality we're just churning through a lot of objects. > + pack_count = 0; > + } > + > + for (struct packed_git *p = get_all_packs(the_repository); p; p = p->next) { > + if (!load_pack_revindex(the_repository, p) && I was going to comment that I wasn't sure if `load_pack_revindex()` was the right thing here, since we don't care about validating the on-the-fly reverse indexes that we generate. But I see in your 3/4 that you are comparing the values on disk to those in memory, which is very nice. > + verify_pack_revindex(p)) { Inside of verify_pack_revindex(), it says that a negative number is returned on error. Do we care about disambiguating >= 0 here? IOW, should this be: if (!load_pack_revindex(the_repository, p) || verify_pack_revindex(p) < 0) ? All looking good otherwise. Thanks, Taylor