Re: selftests: user_events: ftrace_test - RIP: 0010:tracing_update_buffers (kernel/trace/trace.c:6470)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Oct 27, 2023 at 05:38:41PM +0530, Naresh Kamboju wrote:
> Following kernel crash noticed on x86_64 while running selftests: user_events:
> ftrace_test running 6.6.0-rc7-next-20231026.
> 
> Reported-by: Linux Kernel Functional Testing <lkft@xxxxxxxxxx>
> 
> kselftest: Running tests in user_events
> TAP version 13
> 1..4
> # timeout set to 90
> # selftests: user_events: ftrace_test
> [ 2391.606817] general protection fault, probably for non-canonical
> address 0x6b6b6b6b6b6b8a83: 0000 [#1] PREEMPT SMP PTI
> [ 2391.617519] CPU: 1 PID: 34662 Comm: ftrace_test Not tainted
> 6.6.0-rc7-next-20231026 #1
> [ 2391.625428] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
> 2.7 12/07/2021
> [ 2391.632811] RIP: 0010:tracing_update_buffers (kernel/trace/trace.c:6470)
> [ 2391.637952] Code: 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00
> 55 31 f6 48 89 e5 41 55 41 54 53 48 89 fb 48 c7 c7 40 8c 61 94 e8 92
> d3 5a 01 <44> 0f b6 a3 18 1f 00 00 41 80 fc 01 0f 87 c8 dc 4e 01 45 31
> ed 41
> All code
> ========
>    0: 90                    nop
>    1: 90                    nop
>    2: 90                    nop
>    3: 90                    nop
>    4: 90                    nop
>    5: 90                    nop
>    6: 90                    nop
>    7: 90                    nop
>    8: 90                    nop
>    9: 90                    nop
>    a: 90                    nop
>    b: 90                    nop
>    c: 66 0f 1f 00          nopw   (%rax)
>   10: 55                    push   %rbp
>   11: 31 f6                xor    %esi,%esi
>   13: 48 89 e5              mov    %rsp,%rbp
>   16: 41 55                push   %r13
>   18: 41 54                push   %r12
>   1a: 53                    push   %rbx
>   1b: 48 89 fb              mov    %rdi,%rbx
>   1e: 48 c7 c7 40 8c 61 94 mov    $0xffffffff94618c40,%rdi
>   25: e8 92 d3 5a 01        callq  0x15ad3bc
>   2a:* 44 0f b6 a3 18 1f 00 movzbl 0x1f18(%rbx),%r12d <-- trapping instruction
>   31: 00
>   32: 41 80 fc 01          cmp    $0x1,%r12b
>   36: 0f 87 c8 dc 4e 01    ja     0x14edd04
>   3c: 45 31 ed              xor    %r13d,%r13d
>   3f: 41                    rex.B
> 
> Code starting with the faulting instruction
> ===========================================
>    0: 44 0f b6 a3 18 1f 00 movzbl 0x1f18(%rbx),%r12d
>    7: 00
>    8: 41 80 fc 01          cmp    $0x1,%r12b
>    c: 0f 87 c8 dc 4e 01    ja     0x14edcda
>   12: 45 31 ed              xor    %r13d,%r13d
>   15: 41                    rex.B
> [ 2391.656696] RSP: 0018:ffffb36e0a477d80 EFLAGS: 00010246
> [ 2391.661937] RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000080000000
> [ 2391.669064] RDX: 0000000000000000 RSI: ffffffff9299b722 RDI: ffffffff9299b722
> [ 2391.676195] RBP: ffffb36e0a477d98 R08: 000000000000002f R09: 0000000000000002
> [ 2391.683321] R10: ffffb36e0a477d70 R11: 0000000000000000 R12: 0000000000000002
> [ 2391.690453] R13: ffffb36e0a477e88 R14: ffff99c5803a2230 R15: ffff99c581c39000
> [ 2391.697586] FS:  00007fb4b9681740(0000) GS:ffff99c6efa80000(0000)
> knlGS:0000000000000000
> [ 2391.705670] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2391.711410] CR2: 00007fb4b96ab5e0 CR3: 000000010635c002 CR4: 00000000003706f0
> [ 2391.718540] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2391.725665] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2391.732797] Call Trace:
> [ 2391.735240]  <TASK>
> [ 2391.737339] ? show_regs (arch/x86/kernel/dumpstack.c:479)
> [ 2391.740744] ? die_addr (arch/x86/kernel/dumpstack.c:421
> arch/x86/kernel/dumpstack.c:460)
> [ 2391.744056] ? exc_general_protection (arch/x86/kernel/traps.c:697
> arch/x86/kernel/traps.c:642)
> [ 2391.748766] ? asm_exc_general_protection
> (arch/x86/include/asm/idtentry.h:564)
> [ 2391.753652] ? __mutex_lock (kernel/locking/mutex.c:613
> (discriminator 3) kernel/locking/mutex.c:747 (discriminator 3))
> [ 2391.757487] ? __mutex_lock (kernel/locking/mutex.c:613
> (discriminator 3) kernel/locking/mutex.c:747 (discriminator 3))
> [ 2391.761318] ? tracing_update_buffers (kernel/trace/trace.c:6470)
> [ 2391.765851] event_enable_write (kernel/trace/trace_events.c:1408)
> [ 2391.769976] vfs_write (fs/read_write.c:582)
> [ 2391.773296] ? close_fd_get_file (fs/file.c:821)
> [ 2391.777396] ? preempt_count_sub (kernel/sched/core.c:5857
> kernel/sched/core.c:5853 kernel/sched/core.c:5875)
> [ 2391.781496] ksys_write (fs/read_write.c:638)
> [ 2391.784918] __x64_sys_write (fs/read_write.c:646)
> [ 2391.788671] do_syscall_64 (arch/x86/entry/common.c:51
> arch/x86/entry/common.c:82)
> [ 2391.792248] ? do_syscall_64 (arch/x86/entry/common.c:101)
> [ 2391.795995] ? syscall_exit_to_user_mode (kernel/entry/common.c:299)
> [ 2391.800785] ? do_syscall_64 (arch/x86/entry/common.c:101)
> [ 2391.804529] ? do_syscall_64 (arch/x86/entry/common.c:101)
> [ 2391.808275] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
> [ 2391.813327] RIP: 0033:0x7fb4b977c140
> [ 2391.816920] Code: 40 00 48 8b 15 c1 9c 0d 00 f7 d8 64 89 02 48 c7
> c0 ff ff ff ff eb b7 0f 1f 00 80 3d a1 24 0e 00 00 74 17 b8 01 00 00
> 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28
> 48 89
> All code
> ========
>    0: 40 00 48 8b          add    %cl,-0x75(%rax)
>    4: 15 c1 9c 0d 00        adc    $0xd9cc1,%eax
>    9: f7 d8                neg    %eax
>    b: 64 89 02              mov    %eax,%fs:(%rdx)
>    e: 48 c7 c0 ff ff ff ff mov    $0xffffffffffffffff,%rax
>   15: eb b7                jmp    0xffffffffffffffce
>   17: 0f 1f 00              nopl   (%rax)
>   1a: 80 3d a1 24 0e 00 00 cmpb   $0x0,0xe24a1(%rip)        # 0xe24c2
>   21: 74 17                je     0x3a
>   23: b8 01 00 00 00        mov    $0x1,%eax
>   28: 0f 05                syscall
>   2a:* 48 3d 00 f0 ff ff    cmp    $0xfffffffffffff000,%rax <--
> trapping instruction
>   30: 77 58                ja     0x8a
>   32: c3                    retq
>   33: 0f 1f 80 00 00 00 00 nopl   0x0(%rax)
>   3a: 48 83 ec 28          sub    $0x28,%rsp
>   3e: 48                    rex.W
>   3f: 89                    .byte 0x89
> 
> Code starting with the faulting instruction
> ===========================================
>    0: 48 3d 00 f0 ff ff    cmp    $0xfffffffffffff000,%rax
>    6: 77 58                ja     0x60
>    8: c3                    retq
>    9: 0f 1f 80 00 00 00 00 nopl   0x0(%rax)
>   10: 48 83 ec 28          sub    $0x28,%rsp
>   14: 48                    rex.W
>   15: 89                    .byte 0x89
> [ 2391.835660] RSP: 002b:00007ffc43b05b38 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000001
> [ 2391.843225] RAX: ffffffffffffffda RBX: 00007ffc43b05d88 RCX: 00007fb4b977c140
> [ 2391.850350] RDX: 0000000000000002 RSI: 000056376b59b7d4 RDI: 0000000000000007
> [ 2391.857482] RBP: 00007ffc43b05b60 R08: 0000000000000000 R09: 00007fb4b9681740
> [ 2391.864615] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
> [ 2391.871747] R13: 00007ffc43b05d98 R14: 000056376b59ddc8 R15: 00007fb4b9981020
> [ 2391.878907]  </TASK>
> [ 2391.881106] Modules linked in: x86_pkg_temp_thermal fuse configfs
> [ 2391.887288] ---[ end trace 0000000000000000 ]---
> [ 2391.891915] RIP: 0010:tracing_update_buffers (kernel/trace/trace.c:6470)
> [ 2391.897231] Code: 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00
> 55 31 f6 48 89 e5 41 55 41 54 53 48 89 fb 48 c7 c7 40 8c 61 94 e8 92
> d3 5a 01 <44> 0f b6 a3 18 1f 00 00 41 80 fc 01 0f 87 c8 dc 4e 01 45 31
> ed 41
> All code
> ========
>    0: 90                    nop
>    1: 90                    nop
>    2: 90                    nop
>    3: 90                    nop
>    4: 90                    nop
>    5: 90                    nop
>    6: 90                    nop
>    7: 90                    nop
>    8: 90                    nop
>    9: 90                    nop
>    a: 90                    nop
>    b: 90                    nop
>    c: 66 0f 1f 00          nopw   (%rax)
>   10: 55                    push   %rbp
>   11: 31 f6                xor    %esi,%esi
>   13: 48 89 e5              mov    %rsp,%rbp
>   16: 41 55                push   %r13
>   18: 41 54                push   %r12
>   1a: 53                    push   %rbx
>   1b: 48 89 fb              mov    %rdi,%rbx
>   1e: 48 c7 c7 40 8c 61 94 mov    $0xffffffff94618c40,%rdi
>   25: e8 92 d3 5a 01        callq  0x15ad3bc
>   2a:* 44 0f b6 a3 18 1f 00 movzbl 0x1f18(%rbx),%r12d <-- trapping instruction
>   31: 00
>   32: 41 80 fc 01          cmp    $0x1,%r12b
>   36: 0f 87 c8 dc 4e 01    ja     0x14edd04
>   3c: 45 31 ed              xor    %r13d,%r13d
>   3f: 41                    rex.B
> 
> Code starting with the faulting instruction
> ===========================================
>    0: 44 0f b6 a3 18 1f 00 movzbl 0x1f18(%rbx),%r12d
>    7: 00
>    8: 41 80 fc 01          cmp    $0x1,%r12b
>    c: 0f 87 c8 dc 4e 01    ja     0x14edcda
>   12: 45 31 ed              xor    %r13d,%r13d
>   15: 41                    rex.B
> [ 2391.916120] RSP: 0018:ffffb36e0a477d80 EFLAGS: 00010246
> [ 2391.921569] RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000080000000
> [ 2391.928872] RDX: 0000000000000000 RSI: ffffffff9299b722 RDI: ffffffff9299b722
> [ 2391.936237] RBP: ffffb36e0a477d98 R08: 000000000000002f R09: 0000000000000002
> [ 2391.943388] R10: ffffb36e0a477d70 R11: 0000000000000000 R12: 0000000000000002
> [ 2391.950527] R13: ffffb36e0a477e88 R14: ffff99c5803a2230 R15: ffff99c581c39000
> [ 2391.957670] FS:  00007fb4b9681740(0000) GS:ffff99c6efa80000(0000)
> knlGS:0000000000000000
> [ 2391.965822] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2391.971579] CR2: 00007fb4b96ab5e0 CR3: 000000010635c002 CR4: 00000000003706f0
> [ 2391.978721] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2391.985879] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2391.993028] Kernel panic - not syncing: Fatal exception
> [ 2391.998287] Kernel Offset: 0x10000000 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 2392.009066] ---[ end Kernel panic - not syncing: Fatal exception ]---
> 

It's hard to repro this, it does repro after running several times for
me.

Steven, this is firing when the self test writes 1 to the enable file
for a user_event created event.

Sometimes I get this splat instead of a panic:
[  956.819778] ------------[ cut here ]------------
[  956.820526] WARNING: CPU: 12 PID: 914 at kernel/trace/trace.c:516 tracing_release_file_tr+0x46/0x50
[  956.821389] Modules linked in:
[  956.821708] CPU: 12 PID: 914 Comm: ftrace_test Not tainted 6.6.0-rc7-next-20231026 #1
[  956.822501] RIP: 0010:tracing_release_file_tr+0x46/0x50
[  956.822902] Code: d1 03 01 8b 83 c0 1e 00 00 85 c0 74 1d 83 e8 01 48 c7 c7 80 5b ef 84 89 83 c0 1e 00 00 e8 b2 b5 03 01 31 c0 5b e9 75 ee 27 01 <0f> 0b eb df 66 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90
[  956.824261] RSP: 0018:ffffb59d879e3ee8 EFLAGS: 00010246
[  956.824656] RAX: 0000000000000000 RBX: ffff9e386a82c058 RCX: 0000000000000000
[  956.825239] RDX: 0000000000000000 RSI: ffffffff84b6b383 RDI: 00000000ffffffff
[  956.825835] RBP: ffff9e3844076200 R08: 0000000000000000 R09: 0000000000000000
[  956.826425] R10: ffffb59d879e3ee8 R11: ffffffff8328526f R12: 00000000000f801f
[  956.827359] R13: ffff9e3846134020 R14: ffff9e3843fd03a8 R15: 0000000000000000
[  956.828732] FS:  00007eff4f818740(0000) GS:ffff9e3f26300000(0000) knlGS:0000000000000000
[  956.829436] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  956.829968] CR2: 00007eff4f42a250 CR3: 00000007d0744004 CR4: 0000000000370eb0
[  956.830591] Call Trace:
[  956.830804]  <TASK>
[  956.831113]  ? __warn+0x7f/0x160
[  956.831469]  ? tracing_release_file_tr+0x46/0x50
[  956.831891]  ? report_bug+0x1c3/0x1d0
[  956.832224]  ? handle_bug+0x3c/0x70
[  956.832544]  ? exc_invalid_op+0x14/0x70
[  956.832861]  ? asm_exc_invalid_op+0x16/0x20
[  956.833190]  ? tracing_release_file_tr+0x1f/0x50
[  956.833727]  ? tracing_release_file_tr+0x46/0x50
[  956.834159]  ? tracing_release_file_tr+0x1f/0x50
[  956.834579]  __fput+0xab/0x300
[  956.834903]  __x64_sys_close+0x38/0x80
[  956.835235]  do_syscall_64+0x41/0xf0
[  956.835554]  entry_SYSCALL_64_after_hwframe+0x6c/0x74
[  956.835974] RIP: 0033:0x7eff4f515157
[  956.836291] Code: ff e8 0d 16 02 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 83 b8 f7 ff
[  956.837728] RSP: 002b:00007ffc021d8158 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[  956.838351] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007eff4f515157
[  956.838975] RDX: 0000000000000002 RSI: 000055637921d7d4 RDI: 0000000000000005
[  956.839586] RBP: 00007ffc021d8180 R08: 0000000000000000 R09: 00007eff4f818740
[  956.840201] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc021d83b8
[  956.840810] R13: 000055637921cae7 R14: 000055637921fc38 R15: 00007eff4f946040
[  956.841562]  </TASK>
[  956.841787] irq event stamp: 881
[  956.842145] hardirqs last  enabled at (889): [<ffffffff831d7f49>] console_unlock+0x109/0x130
[  956.843013] hardirqs last disabled at (896): [<ffffffff831d7f2e>] console_unlock+0xee/0x130
[  956.843648] softirqs last  enabled at (694): [<ffffffff8314ba2a>] irq_exit_rcu+0x8a/0xe0
[  956.844276] softirqs last disabled at (689): [<ffffffff8314ba2a>] irq_exit_rcu+0x8a/0xe0
[  956.844893] ---[ end trace 0000000000000000 ]---

Warning is from this code:
static void __trace_array_put(struct trace_array *this_tr)
{
        WARN_ON(!this_tr->ref);
        this_tr->ref--;
}

It seems like there might be a timing window or an incorrect call to
trace_array_put() somewhere. Do you think this is related to the eventfs
work?

Thanks,
-Beau

> Links:
> - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20231026/testrun/20823454/suite/log-parser-test/tests/
> - https://lkft.validation.linaro.org/scheduler/job/6974179#L5053
> 
> metadata:
> git_ref: master
> git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
> git_sha: 2ef7141596eed0b4b45ef18b3626f428a6b0a822
> git_describe: next-20231026
> kernel_version: 6.6.0-rc7
> kernel-config:
> https://storage.tuxsuite.com/public/linaro/lkft/builds/2XHt24sNSdog7DYY3FLKFZpZmjG/config
> artifact-location:
> https://storage.tuxsuite.com/public/linaro/lkft/builds/2XHt24sNSdog7DYY3FLKFZpZmjG/
> toolchain: gcc-13
> 
> 
> --
> Linaro LKFT
> https://lkft.linaro.org



[Index of Archives]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]

  Powered by Linux