Hi, On Mon, Aug 29, 2016 at 6:02 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > On Mon, Aug 29, 2016 at 12:03:09PM +0200, Peter Zijlstra wrote: > > @@ -1802,8 +1802,18 @@ event_sched_out(struct perf_event *event, > > > > event->tstamp_stopped = tstamp; > > event->pmu->del(event, 0); > > - event->oncpu = -1; > > - event->state = PERF_EVENT_STATE_INACTIVE; > > + > > + WRITE_ONCE(event->state, PERF_EVENT_STATE_INACTIVE); > > + /* > > + * pmu::del() will have updated the event count. Now mark it inactive, > > + * but take care to clear ->oncpu after the INACTIVE store, such that > > + * while ->state == ACTIVE, ->oncpu must be valid. > > + * > > + * See event_sched_in(), perf_event_restart() and perf_event_read(). > > + */ > > + smp_wmb(); > > + WRITE_ONCE(event->oncpu, -1); > > + > > if (event->pending_disable) { > > event->pending_disable = 0; > > event->state = PERF_EVENT_STATE_OFF; > > @@ -2015,8 +2025,10 @@ event_sched_in(struct perf_event *event, > > > > WRITE_ONCE(event->oncpu, smp_processor_id()); > > /* > > - * Order event::oncpu write to happen before the ACTIVE state > > - * is visible. > > + * Order event::oncpu write to happen before the ACTIVE state is > > + * visible, such that when we observe ACTIVE, oncpu must be correct. > > + * > > + * Matches the smp_rmb() in perf_event_restart(). > > */ > > smp_wmb(); > > WRITE_ONCE(event->state, PERF_EVENT_STATE_ACTIVE); > > Urgh.. that cannot work either, because now perf_event_read() can race > against event_sched_in(). Since that's no longer crossed. > > > @@ -3561,28 +3576,36 @@ u64 perf_event_read_local(struct perf_event *event) > > > > static int perf_event_read(struct perf_event *event, bool group) > > { > > - int ret = 0, cpu_to_read, local_cpu; > > + int ret = 0, cpu_to_read, local_cpu, state; > > + > > + local_cpu = get_cpu(); /* disable preemption to hold off hotplut */ > > + cpu_to_read = READ_ONCE(event->oncpu); > > + /* > > + * Matches smp_wmb() from event_sched_out(), ->oncpu must be valid > > + * IFF we observe ACTIVE. > > + */ > > + smp_rmb(); > > + state = READ_ONCE(event->state); > > The best I can come up with is something like: > > > do { > state = READ_ONCE(event->state); > if (state != ACTIVE) > break; > smp_rmb(); > cpu = READ_ONCE(event->cpu); > smp_rmb(); > } while (READ_ONCE(event->state) != state); > > > And I suppose perf_event_restart() should do the same thing... Let me > ponder this a wee bit more. I am trying to understand this better. There is a race between oncpu/active and the smp_call. By the time you actually do the smp_call the oncpu may be wrong and smp_call now returns an error given David's change. I suspect the race was always there. It boils down to what is the guarantee of the API in terms of the "freshness" of the value returned on read(). I am guessing that if you thought you had to do the smp_call, it is because the event was still active and oncpu != -1. If it is no longer active, it happened very recently and, in that case, one can use the saved count in the perf_event struct as a valid value because it was necessarily updated when the event was scheduled out. -- To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
![]() |