Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> writes: > In the worst case, we walk the entire index and lstat() for a > significant number of skipped-and-not-present files, then near > the end of the loop, we find a skipped-but-present directory > and have to restart the loop. The second pass will still run > the full loop again. Will the second pass actually see any > skipped cache-entries? Will it re-lstat() them? Could the > `goto restart` just be a `break` or `return`? > > I haven't had time to look under the hood here, but I was > hoping that these two counters would help the series author > collect telemetry over many runs and gain more insight into > the perf problem. Without being able to answer these questions, would we be able to interpret the numbers reported from these counters? > Continuing the example from above, if we've already paid the > costs to lstat() the 95% sparse files AND THEN near the bottom > of the loop we have to do a restart, then we should expect > this loop to be doubly slow. It was my hope that this combination > of counters would help us understand the variations in perf. Yes, I understand that double-counting may give some clue to detect that, but it just looked roundabout way to do that. Perhaps counting the path inspected during the first iteration and the path inspected during the second iteration, separately, without the "how many times did we repeat?", would give you a better picture? "After inspecting 2400 paths, we need to go back and then ended up scanning 3000 paths in the flattened index in the second round" would be easier to interpret than "We needed flattening, and scanned 5400 paths in total in the two iterations". > WRT the `intmax_t` vs just `int`: either is fine. I thought "int" was supposed to be natural machine word, while incrementing "intmax_t" is allowed to be much slower than "int". Do we expect more than 2 billion paths?