On Tue, Dec 21, 2021 at 11:51 AM Derrick Stolee <stolee@xxxxxxxxx> wrote: > > On 12/20/2021 10:01 AM, Jeff Hostetler via GitGitGadget wrote: > > > > 3. Rationale > > > > Timers and counters are an alternative to the existing "region" and "data" > > events. The latter are intended to trace the major flow (or phases) of the > > program and possibly capture the amount of work performed within a loop, for > > example. The former are offered as a way to measure activity that is not > > localized, such as the time spent in zlib or lstat, which may be called from > > many different parts of the program. > > I'm excited for these API features. Me too! This would have been very useful on some experiments I had to run in the past. Thanks for working on it, Jeff :) > I also like your attention to thread contexts. I think these timers > would be very interesting to use in parallel checkout. CC'ing Matheus > for his thoughts on where he would want timer summaries for that > feature. For parallel checkout, I think it would be interesting to have timer summaries for open/close, fstat/lstat, write, and inflation/delta-reconstruction. Perhaps pkt-line routines too, so that we can see how much time we spend in inter-process communication. It would be nice to have timer information for disk reading as well (more on that below), but I don't think it is possible since we read the objects through mmap() and thus, we cannot easily isolate the actual reading time from the decompression time :( > I would probably want the per-thread summary to know if we > are blocked on one really long thread while the others finish quickly. That would be interesting. Parallel checkout actually uses subprocesses, but I can see the per-thread summary being useful on grep, for example. (Nevertheless, the use case you mentioned for the timers -- to evaluate the work balance on parallel checkout -- seems very interesting.) > Within that: what are the things causing us to be slow? Is it zlib? > Is it lstat()? On my tests, the bottleneck on checkout heavily depended on the underlying storage type. On HDDs, the bottleneck was object reading (i.e. page faults on mmap()-ed files), with about 70% to 80% of the checkout runtime. On SSDs, reading was much faster, so CPU (i.e. inflation) became the bottleneck, with 50% of the runtime. (Inflation only lost to reading when checking out from *many* loose objects.) Finally, on NFS, file creation with open(O_CREAT | O_EXCL) and fstat() (which makes the NFS client flush previously cached writes to the server) were the bottlenecks, with about 40% of the total runtime each. These numbers come from a (sequential) `git checkout .` execution on an empty working tree of the Linux kernel (v5.12), and they were gathered using eBPF-based profilers. For other operations, especially ones that require many file removals or more laborious tree merging in unpack_trees(), I suspect the bottlenecks may change. If anyone would be interested in seeing the flamegraphs and other plots for these profiling numbers, I have them at: https://matheustavares.gitlab.io/annexes/parallel-checkout/profiling And there is a bit more context at: https://matheustavares.gitlab.io/posts/parallel-checkout