> On 19-Jul-2023, at 7:53 PM, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: > > !! External Email > > On Wed, 19 Jul 2023 10:25:28 +0000 > Ajay Kaher <akaher@xxxxxxxxxx> wrote: > >> - Is it also reproducible on v3? >> - Is it manually reproducible or reproducible using any specific script? >> >> Let me know if I can help. > > Just tried it against v3, and it gave me the splat that I originally had > and starting to fix, which now gives me another splat. I'll spend a couple > more days on it and start sharing code and seeing if we can work together > on this. > > Here's the reproducer (of both v3 splat and the bug I'm hitting now). > > ~# echo 'p:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events > ~# ls /sys/kernel/debug/tracing/events/kprobes/sock_getattr/ > ~# echo '-:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events I tried above steps on v4 but couldn’t reproduce: root@photon-6 [ ~/sdb/linux ]# echo 'p:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events root@photon-6 [ ~/sdb/linux ]# ls /sys/kernel/debug/tracing/events/kprobes/sock_getattr/ enable filter format id trigger root@photon-6 [ ~/sdb/linux ]# echo '-:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events -bash: echo: write error: No such file or directory I have doubt on call_srcu(), it may first end the grace period for parent then for child. If this is true then free_list will have unordered list and could cause problem. > > v3 gives me (and my updates too) > > ====================================================== > WARNING: possible circular locking dependency detected > 6.5.0-rc1-test+ #576 Not tainted > ------------------------------------------------------ > trace-cmd/840 is trying to acquire lock: > ffff8881007e5de0 (&sb->s_type->i_mutex_key#5){++++}-{3:3}, at: dcache_dir_open_wrapper+0xc1/0x1b0 > > but task is already holding lock: > ffff888103ad7e70 (eventfs_rwsem/1){.+.+}-{3:3}, at: dcache_dir_open_wrapper+0x6f/0x1b0 > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #1 (eventfs_rwsem/1){.+.+}-{3:3}: > down_read_nested+0x41/0x180 > eventfs_root_lookup+0x42/0x120 > __lookup_slow+0xff/0x1b0 > walk_component+0xdb/0x150 > path_lookupat+0x67/0x1a0 > filename_lookup+0xe4/0x1f0 > vfs_statx+0x9e/0x180 > vfs_fstatat+0x51/0x70 > __do_sys_newfstatat+0x3f/0x80 > do_syscall_64+0x3a/0xc0 > entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > > -> #0 (&sb->s_type->i_mutex_key#5){++++}-{3:3}: > __lock_acquire+0x165d/0x2390 > lock_acquire+0xd4/0x2d0 > down_write+0x3b/0xd0 > dcache_dir_open_wrapper+0xc1/0x1b0 > do_dentry_open+0x20c/0x510 > path_openat+0x7ad/0xc60 > do_filp_open+0xaf/0x160 > do_sys_openat2+0xab/0xe0 > __x64_sys_openat+0x6a/0xa0 > do_syscall_64+0x3a/0xc0 > entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > > other info that might help us debug this: > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > rlock(eventfs_rwsem/1); > lock(&sb->s_type->i_mutex_key#5); > lock(eventfs_rwsem/1); > lock(&sb->s_type->i_mutex_key#5); > > *** DEADLOCK *** > > 1 lock held by trace-cmd/840: > #0: ffff888103ad7e70 (eventfs_rwsem/1){.+.+}-{3:3}, at: dcache_dir_open_wrapper+0x6f/0x1b0 > > stack backtrace: > CPU: 7 PID: 840 Comm: trace-cmd Not tainted 6.5.0-rc1-test+ #576 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > Call Trace: > <TASK> > dump_stack_lvl+0x57/0x90 > check_noncircular+0x14b/0x160 > __lock_acquire+0x165d/0x2390 > lock_acquire+0xd4/0x2d0 > ? dcache_dir_open_wrapper+0xc1/0x1b0 > down_write+0x3b/0xd0 > ? dcache_dir_open_wrapper+0xc1/0x1b0 > dcache_dir_open_wrapper+0xc1/0x1b0 > ? __pfx_dcache_dir_open_wrapper+0x10/0x10 > do_dentry_open+0x20c/0x510 > path_openat+0x7ad/0xc60 > do_filp_open+0xaf/0x160 > do_sys_openat2+0xab/0xe0 > __x64_sys_openat+0x6a/0xa0 > do_syscall_64+0x3a/0xc0 > entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > RIP: 0033:0x7f1743267e41 > Code: 44 24 18 31 c0 41 83 e2 40 75 3e 89 f0 25 00 00 41 00 3d 00 00 41 00 74 30 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 77 3f 48 8b 54 24 18 64 48 2b 14 25 28 00 00 00 > RSP: 002b:00007ffec10ff5d0 EFLAGS: 00000287 ORIG_RAX: 0000000000000101 > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1743267e41 > RDX: 0000000000090800 RSI: 00007ffec10ffdb0 RDI: 00000000ffffff9c > RBP: 00007ffec10ffda0 R08: 00007ffec11003e0 R09: 0000000000000040 > R10: 0000000000000000 R11: 0000000000000287 R12: 00007ffec11003e0 > R13: 0000000000000040 R14: 0000000000000000 R15: 00007ffec110034b > </TASK> > This is expected from v3 (just ignore as of now), if eventfs_set_ef_status_free crash not reproduced on v3 then it’s v4 issue. -Ajay > > I moved the code around a bit, and it appears that kprobes is getting > dput() more than once. > > I moved the d_invalidate() and dput() into the workqueue function, and on > kprobes going away, d_invalidate() frees it, and dput() is now corrupted. > > Still investigating. The VFS layer is a magic box that needs the right > wizard hat to deal with, but I unfortunately am waiting on back order to > retrieve that specific hat :-p > > -- Steve > > !! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.