Peter, On Wed, Feb 16, 2011 at 5:57 PM, Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> wrote: > On Wed, 2011-02-16 at 13:46 +0000, tip-bot for Stephane Eranian wrote: >> +static inline struct perf_cgroup * >> +perf_cgroup_from_task(struct task_struct *task) >> +{ >> + Â Â Â return container_of(task_subsys_state(task, perf_subsys_id), >> + Â Â Â Â Â Â Â Â Â Â Â struct perf_cgroup, css); >> +} > > =================================================== > [ INFO: suspicious rcu_dereference_check() usage. ] > --------------------------------------------------- > include/linux/cgroup.h:547 invoked rcu_dereference_check() without protection! > other info that might help us debug this: > rcu_scheduler_active = 1, debug_locks = 1 > 1 lock held by perf/1774: > Â#0: Â(&ctx->lock){......}, at: [<ffffffff810afb91>] ctx_sched_in+0x2a/0x37b > stack backtrace: > Pid: 1774, comm: perf Not tainted 2.6.38-rc5-tip+ #94017 > Call Trace: > Â[<ffffffff81070932>] ? lockdep_rcu_dereference+0x9d/0xa5 > Â[<ffffffff810afc4e>] ? ctx_sched_in+0xe7/0x37b > Â[<ffffffff810aff37>] ? perf_event_context_sched_in+0x55/0xa3 > Â[<ffffffff810b0203>] ? __perf_event_task_sched_in+0x20/0x5b > Â[<ffffffff81035714>] ? finish_task_switch+0x49/0xf4 > Â[<ffffffff81340d60>] ? schedule+0x9cc/0xa85 > Â[<ffffffff8110a84c>] ? vfsmount_lock_global_unlock_online+0x9e/0xb0 > Â[<ffffffff8110b556>] ? mntput_no_expire+0x4e/0xc1 > Â[<ffffffff8110b5ef>] ? mntput+0x26/0x28 > Â[<ffffffff810f2add>] ? fput+0x1a0/0x1af > Â[<ffffffff81002eb9>] ? int_careful+0xb/0x2c > Â[<ffffffff813432bf>] ? trace_hardirqs_on_thunk+0x3a/0x3f > Â[<ffffffff81002ec7>] ? int_careful+0x19/0x2c > > I have lockedp enabled in my kernel and during all my tests I never saw this warning. How did you trigger this? > The simple fix seemed to be to add: > > diff --git a/kernel/perf_event.c b/kernel/perf_event.c > index a0a6987..e739e6f 100644 > --- a/kernel/perf_event.c > +++ b/kernel/perf_event.c > @@ -204,7 +204,8 @@ __get_cpu_context(struct perf_event_context *ctx) > Âstatic inline struct perf_cgroup * > Âperf_cgroup_from_task(struct task_struct *task) > Â{ > - Â Â Â return container_of(task_subsys_state(task, perf_subsys_id), > + Â Â Â return container_of(task_subsys_state_check(task, perf_subsys_id, > + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â lockdep_is_held(&ctx->lock)), > Â Â Â Â Â Â Â Â Â Â Â Âstruct perf_cgroup, css); > Â} > > For all callers _should_ hold ctx->lock and ctx->lock is acquired during > ->attach/->exit so holding that lock will pin the cgroup. > I am not sure I follow you here. Are you talking about cgroup_attach() and cgroup_exit()? perf_cgroup_switch() does eventually grab ctx->lock when it gets to the actual save and restore functions. But perf_cgroup_from_task() is called outside of those sections in perf_cgroup_switch(). > However, not all update_context_time()/update_cgrp_time_from_event() > callers actually hold ctx->lock, which is a bug because that lock also > serializes the timestamps. > > Most notably, task_clock_event_read(), which leads us to: > If the warning comes from invoking perf_cgroup_from_task(), then there is also perf_cgroup_switch(). that one is not grabbing any ctx->lock either, but maybe not on all paths. > @@ -5794,9 +5795,14 @@ static void task_clock_event_read(struct perf_event *event) > Â Â Â Âu64 time; > > Â Â Â Âif (!in_nmi()) { > - Â Â Â Â Â Â Â update_context_time(event->ctx); > + Â Â Â Â Â Â Â struct perf_event_context *ctx = event->ctx; > + Â Â Â Â Â Â Â unsigned long flags; > + > + Â Â Â Â Â Â Â spin_lock_irqsave(&ctx->lock, flags); > + Â Â Â Â Â Â Â update_context_time(ctx); > Â Â Â Â Â Â Â Âupdate_cgrp_time_from_event(event); > - Â Â Â Â Â Â Â time = event->ctx->time; > + Â Â Â Â Â Â Â time = ctx->time; > + Â Â Â Â Â Â Â spin_unlock_irqrestore(&ctx->lock, flags); > Â Â Â Â} else { > Â Â Â Â Â Â Â Âu64 now = perf_clock(); > Â Â Â Â Â Â Â Âu64 delta = now - event->ctx->timestamp; > > > I then realized that the events themselves pin the cgroup, so its all > cosmetic at best, but then I already had the below patch... > I assume by 'pin the group' you mean the cgroup cannot disappear while there is at least one event pointing to it. That's is indeed true thanks to refcounting (css_get()). > Thoughts? > > --- > Âkernel/perf_event.c | Â 30 ++++++++++++++++++------------ > Â1 files changed, 18 insertions(+), 12 deletions(-) > > diff --git a/kernel/perf_event.c b/kernel/perf_event.c > index a0a6987..810ee49 100644 > --- a/kernel/perf_event.c > +++ b/kernel/perf_event.c > @@ -202,9 +202,10 @@ __get_cpu_context(struct perf_event_context *ctx) > Â#ifdef CONFIG_CGROUP_PERF > > Âstatic inline struct perf_cgroup * > -perf_cgroup_from_task(struct task_struct *task) > +perf_cgroup_from_task(struct task_struct *task, struct perf_event_context *ctx) > Â{ > - Â Â Â return container_of(task_subsys_state(task, perf_subsys_id), > + Â Â Â return container_of(task_subsys_state_check(task, perf_subsys_id, > + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â lockdep_is_held(&ctx->lock)), > Â Â Â Â Â Â Â Â Â Â Â Âstruct perf_cgroup, css); > Â} > > @@ -268,7 +269,7 @@ static inline void update_cgrp_time_from_cpuctx(struct perf_cpu_context *cpuctx) > > Âstatic inline void update_cgrp_time_from_event(struct perf_event *event) > Â{ > - Â Â Â struct perf_cgroup *cgrp = perf_cgroup_from_task(current); > + Â Â Â struct perf_cgroup *cgrp = perf_cgroup_from_task(current, event->ctx); > Â Â Â Â/* > Â Â Â Â * do not update time when cgroup is not active > Â Â Â Â */ > @@ -279,7 +280,7 @@ static inline void update_cgrp_time_from_event(struct perf_event *event) > Â} > > Âstatic inline void > -perf_cgroup_set_timestamp(struct task_struct *task, u64 now) > +perf_cgroup_set_timestamp(struct task_struct *task, struct perf_event_context *ctx) > Â{ > Â Â Â Âstruct perf_cgroup *cgrp; > Â Â Â Âstruct perf_cgroup_info *info; > @@ -287,9 +288,9 @@ perf_cgroup_set_timestamp(struct task_struct *task, u64 now) > Â Â Â Âif (!task) > Â Â Â Â Â Â Â Âreturn; > > - Â Â Â cgrp = perf_cgroup_from_task(task); > + Â Â Â cgrp = perf_cgroup_from_task(task, ctx); > Â Â Â Âinfo = this_cpu_ptr(cgrp->info); > - Â Â Â info->timestamp = now; > + Â Â Â info->timestamp = ctx->timestamp; > Â} > > Â#define PERF_CGROUP_SWOUT Â Â Â0x1 /* cgroup switch out every event */ > @@ -349,7 +350,7 @@ void perf_cgroup_switch(struct task_struct *task, int mode) > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â * allow event_filter_match() to not > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â * have to pass task around > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â */ > - Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â cpuctx->cgrp = perf_cgroup_from_task(task); > + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â cpuctx->cgrp = perf_cgroup_from_task(task, &cpuctx->ctx); > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âcpu_ctx_sched_in(cpuctx, EVENT_ALL, task); > Â Â Â Â Â Â Â Â Â Â Â Â} > Â Â Â Â Â Â Â Â} > @@ -494,7 +495,7 @@ static inline int perf_cgroup_connect(pid_t pid, struct perf_event *event, > Â} > > Âstatic inline void > -perf_cgroup_set_timestamp(struct task_struct *task, u64 now) > +perf_cgroup_set_timestamp(struct task_struct *task, struct perf_event_context *ctx) > Â{ > Â} > > @@ -1613,7 +1614,7 @@ static int __perf_event_enable(void *info) > Â Â Â Â/* > Â Â Â Â * set current task's cgroup time reference point > Â Â Â Â */ > - Â Â Â perf_cgroup_set_timestamp(current, perf_clock()); > + Â Â Â perf_cgroup_set_timestamp(current, ctx); > > Â Â Â Â__perf_event_mark_enabled(event, ctx); > > @@ -2048,7 +2049,7 @@ ctx_sched_in(struct perf_event_context *ctx, > > Â Â Â Ânow = perf_clock(); > Â Â Â Âctx->timestamp = now; > - Â Â Â perf_cgroup_set_timestamp(task, now); > + Â Â Â perf_cgroup_set_timestamp(task, ctx); > Â Â Â Â/* > Â Â Â Â * First go through the list and put on any pinned groups > Â Â Â Â * in order to give them the best chance of going on. > @@ -5794,9 +5795,14 @@ static void task_clock_event_read(struct perf_event *event) > Â Â Â Âu64 time; > > Â Â Â Âif (!in_nmi()) { > - Â Â Â Â Â Â Â update_context_time(event->ctx); > + Â Â Â Â Â Â Â struct perf_event_context *ctx = event->ctx; > + Â Â Â Â Â Â Â unsigned long flags; > + > + Â Â Â Â Â Â Â spin_lock_irqsave(&ctx->lock, flags); > + Â Â Â Â Â Â Â update_context_time(ctx); > Â Â Â Â Â Â Â Âupdate_cgrp_time_from_event(event); > - Â Â Â Â Â Â Â time = event->ctx->time; > + Â Â Â Â Â Â Â time = ctx->time; > + Â Â Â Â Â Â Â spin_unlock_irqrestore(&ctx->lock, flags); > Â Â Â Â} else { > Â Â Â Â Â Â Â Âu64 now = perf_clock(); > Â Â Â Â Â Â Â Âu64 delta = now - event->ctx->timestamp; > > -- To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
![]() |