On Mon, Oct 30, 2023 at 12:04 PM Mingwei Zhang <mizhang@xxxxxxxxxx> wrote: > > On Fri, Sep 15, 2023 at 9:10 PM Ian Rogers <irogers@xxxxxxxxxx> wrote: > > > > Dummy events are created with an attribute where the period and freq > > are zero. evsel__config will then see the uninitialized values and > > initialize them in evsel__default_freq_period. As fequency mode is > > used by default the dummy event would be set to use frequency > > mode. However, this has no effect on the dummy event but does cause > > unnecessary timers/interrupts. Avoid this overhead by setting the > > period to 1 for dummy events. > > > > evlist__add_aux_dummy calls evlist__add_dummy then sets freq=0 and > > period=1. This isn't necessary after this change and so the setting is > > removed. > > > > From Stephane: > > > > The dummy event is not counting anything. It is used to collect mmap > > records and avoid a race condition during the synthesize mmap phase of > > perf record. As such, it should not cause any overhead during active > > profiling. Yet, it did. Because of a bug the dummy event was > > programmed as a sampling event in frequency mode. Events in that mode > > incur more kernel overheads because on timer tick, the kernel has to > > look at the number of samples for each event and potentially adjust > > the sampling period to achieve the desired frequency. The dummy event > > was therefore adding a frequency event to task and ctx contexts we may > > otherwise not have any, e.g., perf record -a -e > > cpu/event=0x3c,period=10000000/. On each timer tick the > > perf_adjust_freq_unthr_context() is invoked and if ctx->nr_freq is > > non-zero, then the kernel will loop over ALL the events of the context > > looking for frequency mode ones. In doing, so it locks the context, > > and enable/disable the PMU of each hw event. If all the events of the > > context are in period mode, the kernel will have to traverse the list for > > nothing incurring overhead. The overhead is multiplied by a very large > > factor when this happens in a guest kernel. There is no need for the > > dummy event to be in frequency mode, it does not count anything and > > therefore should not cause extra overhead for no reason. > > > > Fixes: 5bae0250237f ("perf evlist: Introduce perf_evlist__new_dummy constructor") > > Reported-by: Stephane Eranian <eranian@xxxxxxxxxx> > > Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx> > > --- > > tools/perf/util/evlist.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c > > index 25c3ebe2c2f5..e36da58522ef 100644 > > --- a/tools/perf/util/evlist.c > > +++ b/tools/perf/util/evlist.c > > @@ -251,6 +251,9 @@ static struct evsel *evlist__dummy_event(struct evlist *evlist) > > .type = PERF_TYPE_SOFTWARE, > > .config = PERF_COUNT_SW_DUMMY, > > .size = sizeof(attr), /* to capture ABI version */ > > + /* Avoid frequency mode for dummy events to avoid associated timers. */ > > + .freq = 0, > > + .sample_period = 1, > > }; > > > > return evsel__new_idx(&attr, evlist->core.nr_entries); > > @@ -277,8 +280,6 @@ struct evsel *evlist__add_aux_dummy(struct evlist *evlist, bool system_wide) > > evsel->core.attr.exclude_kernel = 1; > > evsel->core.attr.exclude_guest = 1; > > evsel->core.attr.exclude_hv = 1; > > - evsel->core.attr.freq = 0; > > - evsel->core.attr.sample_period = 1; > > evsel->core.system_wide = system_wide; > > evsel->no_aux_samples = true; > > evsel->name = strdup("dummy:u"); > > -- > > 2.42.0.459.ge4e396fd5e-goog > > > > Hi Greg, > > This patch is a critical performance fix for perf and vPMU. Can you > help us dispatch the commit to all stable kernel versions? > > Appreciate your help. Thanks. > -Mingwei Oops... Update target email to: stable@xxxxxxxxxxxxxxx