Antw: Re: non-smooth progress indication for git fsck and git gc

"Ulrich Windl" <Ulrich.Windl@xxxxxxxxxxxxxxxxxxxx> · Mon, 20 Aug 2018 10:33:32 +0200

>>> Jeff King <peff@xxxxxxxx> schrieb am 16.08.2018 um 22:55 in Nachricht
<20180816205556.GA8257@xxxxxxxxxxxxxxxxxxxxx>:
> On Thu, Aug 16, 2018 at 10:35:53PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
>> This is all interesting, but I think unrelated to what Ulrich is talking
>> about. Quote:
>> 
>>     Between the two phases of "git fsck" (checking directories and
>>     checking objects) there was a break of several seconds where no
>>     progress was indicated
>> 
>> I.e. it's not about the pause you get with your testcase (which is
>> certainly another issue) but the break between the two progress bars.
> 
> I think he's talking about both. What I said responds to this:

Hi guys!

Yes, I was wondering what git does between the two visible phases, and between
the lines I was suggesting another progress message between those phases. At
least the maximum unspecific three-dot-message "Thinking..." could be displayed
;-) Of course anything more appropriate would be welcome.
Also that message should only be displayed if it's foreseeable that the
operation will take significant time. In my case (I just repeated it a few
minutes ago) the delay is significant (at least 10 seconds). As noted earlier I
was hoping to capture the timing in a screencast, but it seems all the delays
were just optimized away in the recording.

> 
>> >> During "git gc" the writing objects phase did not update for some
>> >> seconds, but then the percentage counter jumped like from 15% to 42%.
> 
> But yeah, I missed that the fsck thing was specifically about a break
> between two meters. That's a separate problem, but also worth
> discussing (and hopefully much easier to address).
> 
>> If you fsck this repository it'll take around (on my spinning rust
>> server) 30 seconds between 100% of "Checking object directories" before
>> you get any output from "Checking objects".
>> 
>> The breakdown of that is (this is from approximate eyeballing):
>> 
>>  * We spend 1-3 seconds just on this:
>>    
>
https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack

> -check.c#L181
> 
> OK, so that's checking the sha1 over the .idx file. We could put a meter
> on that. I wouldn't expect it to generally be all that slow outside of
> pathological cases, since it scales with the number of objects (and 1s
> is our minimum update anyway, so that might be OK as-is). Your case has
> 13M objects, which is quite large.

Sometimes an oldish CPU could bring performance surprises, maybe. Anyway my
CPU is question is an AMD Phenom2 quad-core with 3.2GHz nominal, and there is a
classic spinning disk with 5400RPM built in...

> 
>>  * We spend the majority of the ~30s on this:
>>    
>
https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack

> -check.c#L70-L79
> 
> This is hashing the actual packfile. This is potentially quite long,
> especially if you have a ton of big objects.

That seems to apply. BTW: Is there a way go get some repository statistics
like a histogram of object sizes (or whatever that might be useful to help
making decisions)?

> 
> I wonder if we need to do this as a separate step anyway, though. Our
> verification is based on index-pack these days, which means it's going
> to walk over the whole content as part of the "Indexing objects" step to
> expand base objects and mark deltas for later. Could we feed this hash
> as part of that walk over the data? It's not going to save us 30s, but
> it's likely to be more efficient. And it would fold the effort naturally
> into the existing progress meter.
> 
>>  * Wes spend another 3-5 seconds on this QSORT:
>>    
>
https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack

> -check.c#L105
> 
> That's a tough one. I'm not sure how we'd count it (how many compares we
> do?). And each item is doing so little work that hitting the progress
> code may make things noticeably slower.

If it's sorting, maybe add some code like (wild guess):

if (objects_to_sort > magic_number)
   message("Sorting something...");

> 
> Again, your case is pretty big. Just based on the number of objects,
> linux.git should be 1.5-2.5 seconds on your machine for the same
> operation. Which I think may be small enough to ignore (or even just
> print a generic before/after). It's really the 30s packfile hash that's
> making the whole thing so terrible.
> 
> -Peff