[tip: perf/core] perf/core: Use POLLHUP for pinned events in error

"tip-bot2 for Namhyung Kim" <tip-bot2@xxxxxxxxxxxxx> · Mon, 17 Mar 2025 07:47:29 -0000

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     f4b07fd62d4d11d57a15cb4ae01b3833282eb8f6
Gitweb:        https://git.kernel.org/tip/f4b07fd62d4d11d57a15cb4ae01b3833282eb8f6
Author:        Namhyung Kim <namhyung@xxxxxxxxxx>
AuthorDate:    Sun, 16 Mar 2025 23:17:45 -07:00
Committer:     Ingo Molnar <mingo@xxxxxxxxxx>
CommitterDate: Mon, 17 Mar 2025 08:31:03 +01:00

perf/core: Use POLLHUP for pinned events in error

Pinned performance events can enter an error state when they fail to be
scheduled in the context due to a failed constraint or some other conflict
or condition.

In error state these events won't generate any samples anymore and are
silently ignored until they are recovered by PERF_EVENT_IOC_ENABLE,
or the condition can also change so that they can be scheduled in.

Tooling should be allowed to know about the state change, but
currently there's no mechanism to notify tooling when events enter
an error state.

One way to do this is to issue a POLLHUP event to poll(2) to handle this.
Reading events in an error state would return 0 (EOF) and it matches to
the behavior of POLLHUP according to the man page.

Tooling should remove the fd of the event from pollfd after getting
POLLHUP, otherwise it'll be returned repeatedly.

[ mingo: Clarified the changelog ]

Signed-off-by: Namhyung Kim <namhyung@xxxxxxxxxx>
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Link: https://lore.kernel.org/r/20250317061745.1777584-1-namhyung@xxxxxxxxxx
---
 kernel/events/core.c |  9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2533fc3..ace1bcc 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3984,6 +3984,11 @@ static int merge_sched_in(struct perf_event *event, void *data)
 		if (event->attr.pinned) {
 			perf_cgroup_event_disable(event, ctx);
 			perf_event_set_state(event, PERF_EVENT_STATE_ERROR);
+
+			if (*perf_event_fasync(event))
+				event->pending_kill = POLL_HUP;
+
+			perf_event_wakeup(event);
 		} else {
 			struct perf_cpu_pmu_context *cpc = this_cpc(event->pmu_ctx->pmu);
 
@@ -5925,6 +5930,10 @@ static __poll_t perf_poll(struct file *file, poll_table *wait)
 	if (is_event_hup(event))
 		return events;
 
+	if (unlikely(READ_ONCE(event->state) == PERF_EVENT_STATE_ERROR &&
+		     event->attr.pinned))
+		return events;
+
 	/*
 	 * Pin the event->rb by taking event->mmap_mutex; otherwise
 	 * perf_event_set_output() can swizzle our rb and make us miss wakeups.