On Sat, Sep 01, 2018 at 02:53:28PM +0200, Ævar Arnfjörð Bjarmason wrote: > With this we'll get output like: > > $ ~/g/git/git -C ~/g/2015-04-03-1M-git/ --exec-path=$PWD fsck > Checking object directories: 100% (256/256), done. > Hashing: 100% (452634108/452634108), done. > Hashing: 100% (1073741824/1073741824), done. > Hashing: 100% (1073741824/1073741824), done. > Hashing: 100% (1008001572/1008001572), done. > Checking objects: 2% (262144/13064614) > ^C > > All tests pass with this. Isn't it awesome? Except it's of course a > massive hack, we wouldn't want to just hook into SHA1DC like this. I still consider that output so-so; the byte counts are big and there's no indication how many "hashing" lines we're going to see. It's also broken up in a weird way (it's not one per file; it's one per giant chunk we fed to sha1). > The problem comes down to us needing to call git_hash_sha1_update() with > some really huge input, that function is going to take a *long* time, > and the only way we're getting incremental progress is: > > 1) If we ourselves split the input into N chunks > 2) If we hack into the SHA1 library itself > > This patch does #2, but for this to be acceptable we'd need to do > something like #1. I think we could just do the chunking in verify_packfile(), couldn't we? (And the .idx hash, if we really want to cover that case, but IMHO that's way less interesting). Something like this, which chunks it there, uses a per-packfile meter (though still does not give any clue how many packfiles there are), and shows a throughput meter. diff --git a/pack-check.c b/pack-check.c index d3a57df34f..c94223664f 100644 --- a/pack-check.c +++ b/pack-check.c @@ -62,10 +62,25 @@ static int verify_packfile(struct packed_git *p, uint32_t nr_objects, i; int err = 0; struct idx_entry *entries; + struct progress *hashing_progress; + char *title; + off_t total_hashed = 0; if (!is_pack_valid(p)) return error("packfile %s cannot be accessed", p->pack_name); + if (progress) { + /* Probably too long... */ + title = xstrfmt("Hashing %s", p->pack_name); + + /* + * I don't think it actually works to have two progresses going + * at the same time, because when the first one ends, we'll + * cancel the alarm. But hey, this is a hacky proof of concept. + */ + hashing_progress = start_progress(title, 0); + } + the_hash_algo->init_fn(&ctx); do { unsigned long remaining; @@ -75,9 +90,25 @@ static int verify_packfile(struct packed_git *p, pack_sig_ofs = p->pack_size - the_hash_algo->rawsz; if (offset > pack_sig_ofs) remaining -= (unsigned int)(offset - pack_sig_ofs); - the_hash_algo->update_fn(&ctx, in, remaining); + while (remaining) { + int chunk = remaining < 4096 ? remaining : 4096; + the_hash_algo->update_fn(&ctx, in, chunk); + in += chunk; + remaining -= chunk; + total_hashed += chunk; + /* + * The progress code needs tweaking to show throughputs + * better for open-ended meters. + */ + display_throughput(hashing_progress, total_hashed); + display_progress(hashing_progress, 0); + } } while (offset < pack_sig_ofs); + the_hash_algo->final_fn(hash, &ctx); + stop_progress(&hashing_progress); + free(title); + pack_sig = use_pack(p, w_curs, pack_sig_ofs, NULL); if (hashcmp(hash, pack_sig)) err = error("%s pack checksum mismatch",