On Wed, Mar 10, 2021 at 11:41:34AM +0100, Marco Elver wrote: > Adds bit perf_event_attr::remove_on_exec, to support removing an event > from a task on exec. > > This option supports the case where an event is supposed to be > process-wide only, and should not propagate beyond exec, to limit > monitoring to the original process image only. > > Signed-off-by: Marco Elver <elver@xxxxxxxxxx> > +/* > + * Removes all events from the current task that have been marked > + * remove-on-exec, and feeds their values back to parent events. > + */ > +static void perf_event_remove_on_exec(void) > +{ > + int ctxn; > + > + for_each_task_context_nr(ctxn) { > + struct perf_event_context *ctx; > + struct perf_event *event, *next; > + > + ctx = perf_pin_task_context(current, ctxn); > + if (!ctx) > + continue; > + mutex_lock(&ctx->mutex); > + > + list_for_each_entry_safe(event, next, &ctx->event_list, event_entry) { > + if (!event->attr.remove_on_exec) > + continue; > + > + if (!is_kernel_event(event)) > + perf_remove_from_owner(event); > + perf_remove_from_context(event, DETACH_GROUP); There's a comment on this in perf_event_exit_event(), if this task happens to have the original event, then DETACH_GROUP will destroy the grouping. I think this wants to be: perf_remove_from_text(event, child_event->parent ? DETACH_GROUP : 0); or something. > + /* > + * Remove the event and feed back its values to the > + * parent event. > + */ > + perf_event_exit_event(event, ctx, current); Oooh, and here we call it... but it will do list_del_even() / perf_group_detach() *again*. So the problem is that perf_event_exit_task_context() doesn't use remove_from_context(), but instead does task_ctx_sched_out() and then relies on the events not being active. Whereas above you *DO* use remote_from_context(), but then perf_event_exit_event() will try and remove it more. > + } > + mutex_unlock(&ctx->mutex); perf_unpin_context(ctx); > + put_ctx(ctx); > + } > +}