On 2/08/24 14:02, Adrian Hunter wrote: > On 2/08/24 12:25, Peter Zijlstra wrote: >> On Thu, Aug 01, 2024 at 02:13:41PM -0700, Andrii Nakryiko wrote: >> >>> Ok, this bisected to: >>> >>> 675ad74989c2 ("perf/core: Add aux_pause, aux_resume, aux_start_paused") >> >> Adrian, there are at least two obvious bugs there: >> >> - aux_action was key's off of PERF_PMU_CAP_AUX_OUTPUT, which is not >> right, that's the capability where events can output to AUX -- aka. >> PEBS-to-PT. It should be PERF_PMU_CAP_ITRACE, which is the >> PT/CoreSight thing. Not sure about that. In perf_event_alloc(), there is: if (event->attr.aux_output && (!(pmu->capabilities & PERF_PMU_CAP_AUX_OUTPUT) || event->attr.aux_pause || event->attr.aux_resume)) { err = -EOPNOTSUPP; goto err_pmu; } which is to prevent aux_output with aux_pause or aux_resume. That is because aux_output (i.e. PEBS-via-PT) has no interrupt and so does not overflow. (Instead the PEBS record is written by hardware to the Intel PT trace) No overflow => no (software) aux_pause/aux_resume, so aux_output with aux_pause/aux_resume does not make sense. The PMU capability for aux_pause/aux_resume or aux_start_paused is PERF_PMU_CAP_AUX_PAUSE. aux_pause/aux_resume are valid for non-AUX events (member of the group), whereas aux_start_paused is valid for the AUX event itself (group leader). For aux_pause/aux_resume the group leader's PMU capability is checked. For aux_start_paused the event's PMU capability is checked. >> >> - it sets aux_paused unconditionally, which is scribbling in the giant >> union which is overwriting state set by perf_init_event(). That definitely needs fixing, but the fix is just the diff from my previous reply: diff --git a/kernel/events/core.c b/kernel/events/core.c index e4cb6e5a5f40..2072aaa4d449 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -12151,7 +12151,8 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, err = -EOPNOTSUPP; goto err_pmu; } - event->hw.aux_paused = event->attr.aux_start_paused; + if (event->attr.aux_start_paused) + event->hw.aux_paused = 1; if (cgroup_fd != -1) { err = perf_cgroup_connect(cgroup_fd, event, attr, group_leader); I tested that with: # perf probe -x /root/main -a main Added new event: probe_main:main (on main in /root/main) # perf record -e probe_main:main -- ./main and it made the problem go away. >> >> But I think there's more problems, we need to do the aux_action >> validation after perf_get_aux_event(), we can't know if having those >> bits set makes sense before that. This means the perf_event_alloc() site >> is wrong in the first place. As above, aux_start_paused is used on the AUX event itself, so the PMU capability is checked in perf_event_alloc: if (event->attr.aux_start_paused && !(pmu->capabilities & PERF_PMU_CAP_AUX_PAUSE)) { err = -EOPNOTSUPP; goto err_pmu; } Whereas aux_pause/aux_resume are checked in perf_get_aux_event(): if ((event->attr.aux_pause || event->attr.aux_resume) && !(group_leader->pmu->capabilities & PERF_PMU_CAP_AUX_PAUSE)) return 0; That all seems OK, so please let me know if there is something else to change.