On Wed, Jun 28, 2017 at 3:56 AM, Mark Rutland <mark.rutland@xxxxxxx> wrote: > On Wed, Jun 28, 2017 at 11:12:48AM +0100, Mark Rutland wrote: >> On Tue, Jun 27, 2017 at 09:51:00PM -0700, Kyle Huey wrote: > >> As we're trying to avoid smapling state, I think we can move the check >> into perf_prepare_sample() or __perf_event_output(), where that state is >> actually sampled. I'll take a look at that momentarily. >> >> Just to clarify, you don't care about the sample state at all? i.e. you >> don't need the user program counter? >> >> Is that signal delivered to the tracee, or to a different process that >> traces it? If the latter, what ensures that the task is stopped >> sufficiently quickly? >> >> > It seems to me that it might be reasonable to ignore the interrupt if >> > the purpose of the interrupt is to trigger sampling of the CPUs >> > register state. But if the interrupt will trigger some other >> > operation, such as a signal on an fd, then there's no reason to drop >> > it. >> >> Agreed. I'll try to have a patch for this soon. >> >> I just need to figure out exactly where that overflow signal is >> generated by the perf core. > > I've figured that out now. That's handled by perf_pending_event(), whcih > is the irq_work we schedule in __perf_event_overflow(). > > Does the below patch work for you? > > ---->8---- > From bb1f99dace508ce34ab0818f91d59e73450fa142 Mon Sep 17 00:00:00 2001 > From: Mark Rutland <mark.rutland@xxxxxxx> > Date: Wed, 28 Jun 2017 11:39:25 +0100 > Subject: [PATCH] perf/core: generate overflow signal when samples are dropped > > We recently tried to kill an information leak where users could receive > kernel samples due to skid on the PMU interrupt. To block this, we > bailed out early in perf_event_overflow(), as we do for non-sampling > events. > > This broke rr, which uses sampling events to receive a signal on > overflow (but does not care about the contents of the sample). These > signals are critical to the correct operation of rr. > > Instead of bailing out early in perf_event_overflow, we can bail prior > to performing the actual sampling in __perf_event_output(). This avoids > the information leak, but preserves the generation of the signal. > > Since we don't place any sample data into the ring buffer, the signal is > arguably spurious. However, a userspace ringbuffer consumer can already > consume data prior to taking the associated signals, and therefore must > handle spurious signals to operate correctly. Thus, this signal > shouldn't be harmful. > > Fixes: cc1582c231ea041f ("perf/core: Drop kernel samples even though :u is specified") > Reported-by: Kyle Huey <me@xxxxxxxxxxxx> > Signed-off-by: Mark Rutland <mark.rutland@xxxxxxx> > Cc: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx> > Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> > Cc: Ingo Molnar <mingo@xxxxxxxxxx> > Cc: Jin Yao <yao.jin@xxxxxxxxxxxxxxx> > Cc: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> > --- > kernel/events/core.c | 46 +++++++++++++++++++++++++--------------------- > 1 file changed, 25 insertions(+), 21 deletions(-) > > diff --git a/kernel/events/core.c b/kernel/events/core.c > index 6c4e523..6b263f3 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -6090,6 +6090,21 @@ void perf_prepare_sample(struct perf_event_header *header, > } > } > > +static bool sample_is_allowed(struct perf_event *event, struct pt_regs *regs) > +{ > + /* > + * Due to interrupt latency (AKA "skid"), we may enter the > + * kernel before taking an overflow, even if the PMU is only > + * counting user events. > + * To avoid leaking information to userspace, we must always > + * reject kernel samples when exclude_kernel is set. > + */ > + if (event->attr.exclude_kernel && !user_mode(regs)) > + return false; > + > + return true; > +} > + > static void __always_inline > __perf_event_output(struct perf_event *event, > struct perf_sample_data *data, > @@ -6101,6 +6116,12 @@ void perf_prepare_sample(struct perf_event_header *header, > struct perf_output_handle handle; > struct perf_event_header header; > > + /* > + * For security, drop the skid kernel samples if necessary. > + */ > + if (!sample_is_allowed(event, regs)) > + return ret; Just a bare return here. > + > /* protect the callchain buffers */ > rcu_read_lock(); > > @@ -7316,21 +7337,6 @@ int perf_event_account_interrupt(struct perf_event *event) > return __perf_event_account_interrupt(event, 1); > } > > -static bool sample_is_allowed(struct perf_event *event, struct pt_regs *regs) > -{ > - /* > - * Due to interrupt latency (AKA "skid"), we may enter the > - * kernel before taking an overflow, even if the PMU is only > - * counting user events. > - * To avoid leaking information to userspace, we must always > - * reject kernel samples when exclude_kernel is set. > - */ > - if (event->attr.exclude_kernel && !user_mode(regs)) > - return false; > - > - return true; > -} > - > /* > * Generic event overflow handling, sampling. > */ > @@ -7352,12 +7358,6 @@ static int __perf_event_overflow(struct perf_event *event, > ret = __perf_event_account_interrupt(event, throttle); > > /* > - * For security, drop the skid kernel samples if necessary. > - */ > - if (!sample_is_allowed(event, regs)) > - return ret; > - > - /* > * XXX event_limit might not quite work as expected on inherited > * events > */ > @@ -7372,6 +7372,10 @@ static int __perf_event_overflow(struct perf_event *event, > > READ_ONCE(event->overflow_handler)(event, data, regs); > > + /* > + * We must generate a wakeup regardless of whether we actually > + * generated a sample. This is relied upon by rr. > + */ > if (*perf_event_fasync(event) && event->pending_kill) { > event->pending_wakeup = 1; > irq_work_queue(&event->pending); > -- > 1.9.1 > I can confirm that with that fixed to compile, this patch fixes rr. Thanks! - Kyle