Re: [PATCH v3 01/11] perf: Rework perf_event_exit_event()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 25, 2021 at 05:17PM +0100, Marco Elver wrote:
[...]
> > syzkaller found a crash with stack trace pointing at changes in this
> > patch. Can't tell if this is an old issue or introduced in this series.
> 
> Yay, I found a reproducer. v5.12-rc4 is good, and sadly with this patch only we
> crash. :-/
> 
> Here's a stacktrace with just this patch applied:
> 
> | BUG: kernel NULL pointer dereference, address: 00000000000007af
[...]
> | RIP: 0010:task_pid_ptr kernel/pid.c:324 [inline]
> | RIP: 0010:__task_pid_nr_ns+0x112/0x240 kernel/pid.c:500
[...]
> | Call Trace:
> |  perf_event_pid_type kernel/events/core.c:1412 [inline]
> |  perf_event_pid kernel/events/core.c:1421 [inline]
> |  perf_event_read_event+0x78/0x1d0 kernel/events/core.c:7406
> |  sync_child_event kernel/events/core.c:12404 [inline]
> |  perf_child_detach kernel/events/core.c:2223 [inline]
> |  __perf_remove_from_context+0x14d/0x280 kernel/events/core.c:2359
> |  perf_remove_from_context+0x9f/0xf0 kernel/events/core.c:2395
> |  perf_event_exit_event kernel/events/core.c:12442 [inline]
> |  perf_event_exit_task_context kernel/events/core.c:12523 [inline]
> |  perf_event_exit_task+0x276/0x4c0 kernel/events/core.c:12556
> |  do_exit+0x4cd/0xed0 kernel/exit.c:834
> |  do_group_exit+0x4d/0xf0 kernel/exit.c:922
> |  get_signal+0x1d2/0xf30 kernel/signal.c:2777
> |  arch_do_signal_or_restart+0xf7/0x750 arch/x86/kernel/signal.c:789
> |  handle_signal_work kernel/entry/common.c:147 [inline]
> |  exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
> |  exit_to_user_mode_prepare+0x113/0x190 kernel/entry/common.c:208
> |  irqentry_exit_to_user_mode+0x6/0x30 kernel/entry/common.c:314
> |  asm_exc_general_protection+0x1e/0x30 arch/x86/include/asm/idtentry.h:571

I spun up gdb, and it showed me this:

| #0  perf_event_read_event (event=event@entry=0xffff888107cd5000, task=task@entry=0xffffffffffffffff)
|     at kernel/events/core.c:7397
									^^^ TASK_TOMBSTONE
| #1  0xffffffff811fc9cd in sync_child_event (child_event=0xffff888107cd5000) at kernel/events/core.c:12404
| #2  perf_child_detach (event=0xffff888107cd5000) at kernel/events/core.c:2223
| #3  __perf_remove_from_context (event=event@entry=0xffff888107cd5000, cpuctx=cpuctx@entry=0xffff88842fdf0c00,
|     ctx=ctx@entry=0xffff8881073cb800, info=info@entry=0x3 <fixed_percpu_data+3>) at kernel/events/core.c:2359
| #4  0xffffffff811fcb9f in perf_remove_from_context (event=event@entry=0xffff888107cd5000, flags=flags@entry=3)
|     at kernel/events/core.c:2395
| #5  0xffffffff81204526 in perf_event_exit_event (ctx=0xffff8881073cb800, event=0xffff888107cd5000)
|     at kernel/events/core.c:12442
| #6  perf_event_exit_task_context (ctxn=0, child=0xffff88810531a200) at kernel/events/core.c:12523
| #7  perf_event_exit_task (child=0xffff88810531a200) at kernel/events/core.c:12556
| #8  0xffffffff8108838d in do_exit (code=code@entry=11) at kernel/exit.c:834
| #9  0xffffffff81088e4d in do_group_exit (exit_code=11) at kernel/exit.c:922

and therefore synthesized this fix on top:

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 57de8d436efd..e77294c7e654 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -12400,7 +12400,7 @@ static void sync_child_event(struct perf_event *child_event)
 	if (child_event->attr.inherit_stat) {
 		struct task_struct *task = child_event->ctx->task;
 
-		if (task)
+		if (task && task != TASK_TOMBSTONE)
 			perf_event_read_event(child_event, task);
 	}
 
which fixes the problem. My guess is that the parent and child are both
racing to exit?

Does that make any sense?

Thanks,
-- Marco



[Index of Archives]     [Linux Kernel]     [Kernel Newbies]     [x86 Platform Driver]     [Netdev]     [Linux Wireless]     [Netfilter]     [Bugtraq]     [Linux Filesystems]     [Yosemite Discussion]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]

  Powered by Linux