Re: [PATCH] index: add trace2 region for clear skip worktree

Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> · Wed, 26 Oct 2022 14:29:09 -0400

On 10/26/22 12:01 PM, Junio C Hamano wrote:
Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> writes:

In the worst case, we walk the entire index and lstat() for a
significant number of skipped-and-not-present files, then near
the end of the loop, we find a skipped-but-present directory
and have to restart the loop.  The second pass will still run
the full loop again.  Will the second pass actually see any
skipped cache-entries?  Will it re-lstat() them?  Could the
`goto restart` just be a `break` or `return`?

I haven't had time to look under the hood here, but I was
hoping that these two counters would help the series author
collect telemetry over many runs and gain more insight into
the perf problem.

Without being able to answer these questions, would we be able to
interpret the numbers reported from these counters?

Continuing the example from above, if we've already paid the
costs to lstat() the 95% sparse files AND THEN near the bottom
of the loop we have to do a restart, then we should expect
this loop to be doubly slow.  It was my hope that this combination
of counters would help us understand the variations in perf.

Yes, I understand that double-counting may give some clue to detect
that, but it just looked roundabout way to do that.  Perhaps
counting the path inspected during the first iteration and the path
inspected during the second iteration, separately, without the "how
many times did we repeat?", would give you a better picture?  "After
inspecting 2400 paths, we need to go back and then ended up scanning
3000 paths in the flattened index in the second round" would be
easier to interpret than "We needed flattening, and scanned 5400
paths in total in the two iterations".

Good point and I was wondering about whether we might see "2 x 95%"
values for path_count in really slow cases.  And yes, it would be
better to have `int path_count[2]` and tally each loop pass
independently.

I guess I was looking for a first-order "where is the pain?" knowing
that we could always dig deeper later.  That is, is the lstat's or
is it the ensure-full and index rewrite?  Or both?

We have other places that do lstat() over the cache-entries and have
added code to spread the loop across n threads.  I doubt that is needed
here and I didn't want to lead with it.

WRT the `intmax_t` vs just `int`: either is fine.

I thought "int" was supposed to be natural machine word, while
incrementing "intmax_t" is allowed to be much slower than "int".
Do we expect more than 2 billion paths?

You're right.  An `int` would be fine here.

Thanks,
Jeff