Hello, On Wed, May 15, 2024 at 9:56 PM Ian Rogers <irogers@xxxxxxxxxx> wrote: > > On Wed, May 15, 2024 at 9:24 PM Howard Chu <howardchu95@xxxxxxxxx> wrote: > > > > Hello, > > > > Here is a little update on --off-cpu. > > > > > > It would be nice to start landing this work so I'm wondering what the > > > > minimal way to do that is. It seems putting behavior behind a flag is > > > > a first step. > > > > The flag to determine output threshold of off-cpu has been implemented. > > If the accumulated off-cpu time exceeds this threshold, output the sample > > directly; otherwise, save it later for off_cpu_write. > > > > But adding an extra pass to handle off-cpu samples introduces performance > > issues, here's the processing rate of --off-cpu sampling(with the > > extra pass to extract raw > > sample data) and without. The --off-cpu-threshold is in nanoseconds. > > > > +-----------------------------------------------------+---------------------------------------+----------------------+ > > | comm | type > > | process rate | > > +-----------------------------------------------------+---------------------------------------+----------------------+ > > | -F 4999 -a | regular > > samples (w/o extra pass) | 13128.675 samples/ms | > > +-----------------------------------------------------+---------------------------------------+----------------------+ > > | -F 1 -a --off-cpu --off-cpu-threshold 100 | offcpu samples > > (extra pass) | 2843.247 samples/ms | > > +-----------------------------------------------------+---------------------------------------+----------------------+ > > | -F 4999 -a --off-cpu --off-cpu-threshold 100 | offcpu & > > regular samples (extra pass) | 3910.686 samples/ms | > > +-----------------------------------------------------+---------------------------------------+----------------------+ > > | -F 4999 -a --off-cpu --off-cpu-threshold 1000000000 | few offcpu & > > regular (extra pass) | 4661.229 samples/ms | > > +-----------------------------------------------------+---------------------------------------+----------------------+ What does the process rate mean? Is the sample for the off-cpu event or other (cpu-cycles)? Is it from a single CPU or system-wide or per-task? > > > > It's not ideal. I will find a way to reduce overhead. For example > > process them samples > > at save time as Ian mentioned. > > > > > > To turn the bpf-output samples into off-cpu events there is a pass > > > > added to the saving. I wonder if that can be more generic, like a save > > > > time perf inject. > > > > And I will find a default value for such a threshold based on performance > > and common use cases. > > > > > Sounds good. We might add an option to specify the threshold to > > > determine whether to dump the data or to save it for later. But ideally > > > it should be able to find a good default. > > > > These will be done before the GSoC kick-off on May 27. > > This all sounds good. 100ns seems like quite a low threshold and 1s > extremely high, shame such a high threshold is marginal for the > context switch performance change. I wonder 100 microseconds may be a > more sensible threshold. It's 100 times larger than the cost of 1 > context switch but considerably less than a frame redraw at 60FPS (16 > milliseconds). I don't know what's the sensible default. But 1 msec could be another candidate for the similar reason. :) Thanks, Namhyung