Re: non-smooth progress indication for git fsck and git gc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Sep 01, 2018 at 02:53:28PM +0200, Ævar Arnfjörð Bjarmason wrote:

> With this we'll get output like:
> 
>     $ ~/g/git/git  -C ~/g/2015-04-03-1M-git/  --exec-path=$PWD fsck
>     Checking object directories: 100% (256/256), done.
>     Hashing: 100% (452634108/452634108), done.
>     Hashing: 100% (1073741824/1073741824), done.
>     Hashing: 100% (1073741824/1073741824), done.
>     Hashing: 100% (1008001572/1008001572), done.
>     Checking objects:   2% (262144/13064614)
>     ^C
> 
> All tests pass with this. Isn't it awesome? Except it's of course a
> massive hack, we wouldn't want to just hook into SHA1DC like this.

I still consider that output so-so; the byte counts are big and there's
no indication how many "hashing" lines we're going to see. It's also
broken up in a weird way (it's not one per file; it's one per giant
chunk we fed to sha1).

> The problem comes down to us needing to call git_hash_sha1_update() with
> some really huge input, that function is going to take a *long* time,
> and the only way we're getting incremental progress is:
> 
>  1) If we ourselves split the input into N chunks
>  2) If we hack into the SHA1 library itself
> 
> This patch does #2, but for this to be acceptable we'd need to do
> something like #1.

I think we could just do the chunking in verify_packfile(), couldn't we?
(And the .idx hash, if we really want to cover that case, but IMHO
that's way less interesting).

Something like this, which chunks it there, uses a per-packfile meter
(though still does not give any clue how many packfiles there are), and
shows a throughput meter.

diff --git a/pack-check.c b/pack-check.c
index d3a57df34f..c94223664f 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -62,10 +62,25 @@ static int verify_packfile(struct packed_git *p,
 	uint32_t nr_objects, i;
 	int err = 0;
 	struct idx_entry *entries;
+	struct progress *hashing_progress;
+	char *title;
+	off_t total_hashed = 0;
 
 	if (!is_pack_valid(p))
 		return error("packfile %s cannot be accessed", p->pack_name);
 
+	if (progress) {
+		/* Probably too long... */
+		title = xstrfmt("Hashing %s", p->pack_name);
+
+		/*
+		 * I don't think it actually works to have two progresses going
+		 * at the same time, because when the first one ends, we'll
+		 * cancel the alarm. But hey, this is a hacky proof of concept.
+		 */
+		hashing_progress = start_progress(title, 0);
+	}
+
 	the_hash_algo->init_fn(&ctx);
 	do {
 		unsigned long remaining;
@@ -75,9 +90,25 @@ static int verify_packfile(struct packed_git *p,
 			pack_sig_ofs = p->pack_size - the_hash_algo->rawsz;
 		if (offset > pack_sig_ofs)
 			remaining -= (unsigned int)(offset - pack_sig_ofs);
-		the_hash_algo->update_fn(&ctx, in, remaining);
+		while (remaining) {
+			int chunk = remaining < 4096 ? remaining : 4096;
+			the_hash_algo->update_fn(&ctx, in, chunk);
+			in += chunk;
+			remaining -= chunk;
+			total_hashed += chunk;
+			/*
+			 * The progress code needs tweaking to show throughputs
+			 * better for open-ended meters.
+			 */
+			display_throughput(hashing_progress, total_hashed);
+			display_progress(hashing_progress, 0);
+		}
 	} while (offset < pack_sig_ofs);
+
 	the_hash_algo->final_fn(hash, &ctx);
+	stop_progress(&hashing_progress);
+	free(title);
+
 	pack_sig = use_pack(p, w_curs, pack_sig_ofs, NULL);
 	if (hashcmp(hash, pack_sig))
 		err = error("%s pack checksum mismatch",



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux