On Sat, Aug 31, 2024 at 07:43:16AM +0000, Luo Gengkun wrote: > Perf events has the notion of sampling frequency which is implemented in > software by dynamically adjusting the counter period so that samples occur > at approximately the target frequency. Period adjustment is done in 2 > places: > - when the counter overflows (and a sample is recorded) > - each timer tick, when the event is active > The later case is slightly flawed because it assumes that the time since > the last timer-tick period adjustment is 1 tick, whereas the event may not > have been active (e.g. for a task that is sleeping). > > Fix by using jiffies to determine the elapsed time in that case. > > Signed-off-by: Luo Gengkun <luogengkun@xxxxxxxxxxxxxxx> > Reviewed-by: Adrian Hunter <adrian.hunter@xxxxxxxxx> > --- > include/linux/perf_event.h | 1 + > kernel/events/core.c | 12 +++++++++--- > 2 files changed, 10 insertions(+), 3 deletions(-) > > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h > index 1a8942277dda..d29b7cf971a1 100644 > --- a/include/linux/perf_event.h > +++ b/include/linux/perf_event.h > @@ -265,6 +265,7 @@ struct hw_perf_event { > * State for freq target events, see __perf_event_overflow() and > * perf_adjust_freq_unthr_context(). > */ > + u64 freq_tick_stamp; > u64 freq_time_stamp; > u64 freq_count_stamp; > #endif > diff --git a/kernel/events/core.c b/kernel/events/core.c > index a9395bbfd4aa..183291e0d070 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -55,6 +55,7 @@ > #include <linux/pgtable.h> > #include <linux/buildid.h> > #include <linux/task_work.h> > +#include <linux/jiffies.h> > > #include "internal.h" > > @@ -4120,9 +4121,11 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list) > { > struct perf_event *event; > struct hw_perf_event *hwc; > - u64 now, period = TICK_NSEC; > + u64 now, period, tick_stamp; > s64 delta; > > + tick_stamp = jiffies64_to_nsecs(get_jiffies_64()); > + > list_for_each_entry(event, event_list, active_list) { > if (event->state != PERF_EVENT_STATE_ACTIVE) > continue; > @@ -4148,6 +4151,9 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list) > */ > event->pmu->stop(event, PERF_EF_UPDATE); > > + period = tick_stamp - hwc->freq_tick_stamp; > + hwc->freq_tick_stamp = tick_stamp; > + > now = local64_read(&event->count); > delta = now - hwc->freq_count_stamp; > hwc->freq_count_stamp = now; > @@ -4157,9 +4163,9 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list) > * reload only if value has changed > * we have stopped the event so tell that > * to perf_adjust_period() to avoid stopping it > - * twice. > + * twice. And skip if it is the first tick adjust period. > */ > - if (delta > 0) > + if (delta > 0 && likely(period != tick_stamp)) > perf_adjust_period(event, period, delta, false); > > event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0); This one I'm less happy with.. that condition 'period != tick_stamp' doesn't make sense to me. That's only false if hwc->freq_tick_stamp == 0, which it will only be once after event creation. Even through the Changelog babbles about event scheduling. Also, that all should then be written something like: if (delta > 0 && ...) { perf_adjust_period(...); adjusted = true; } event->pmu->start(event, adjusted ? PERF_EF_RELOAD : 0);