Re: [PATCH v4 00/10] tracing: introducing eventfs

Steven Rostedt <rostedt@xxxxxxxxxxx> · Wed, 19 Jul 2023 10:23:10 -0400

On Wed, 19 Jul 2023 10:25:28 +0000
Ajay Kaher <akaher@xxxxxxxxxx> wrote:

> - Is it also reproducible on v3?
> - Is it manually reproducible or reproducible using any specific script?
> 
> Let me know if I can help.

Just tried it against v3, and it gave me the splat that I originally had
and starting to fix, which now gives me another splat. I'll spend a couple
more days on it and start sharing code and seeing if we can work together
on this.

Here's the reproducer (of both v3 splat and the bug I'm hitting now).

 ~# echo 'p:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events
 ~# ls /sys/kernel/debug/tracing/events/kprobes/sock_getattr/
 ~# echo '-:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events

v3 gives me (and my updates too)

======================================================
 WARNING: possible circular locking dependency detected
 6.5.0-rc1-test+ #576 Not tainted
 ------------------------------------------------------
 trace-cmd/840 is trying to acquire lock:
 ffff8881007e5de0 (&sb->s_type->i_mutex_key#5){++++}-{3:3}, at: dcache_dir_open_wrapper+0xc1/0x1b0

 but task is already holding lock:
 ffff888103ad7e70 (eventfs_rwsem/1){.+.+}-{3:3}, at: dcache_dir_open_wrapper+0x6f/0x1b0

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (eventfs_rwsem/1){.+.+}-{3:3}:
        down_read_nested+0x41/0x180
        eventfs_root_lookup+0x42/0x120
        __lookup_slow+0xff/0x1b0
        walk_component+0xdb/0x150
        path_lookupat+0x67/0x1a0
        filename_lookup+0xe4/0x1f0
        vfs_statx+0x9e/0x180
        vfs_fstatat+0x51/0x70
        __do_sys_newfstatat+0x3f/0x80
        do_syscall_64+0x3a/0xc0
        entry_SYSCALL_64_after_hwframe+0x6e/0xd8

 -> #0 (&sb->s_type->i_mutex_key#5){++++}-{3:3}:
        __lock_acquire+0x165d/0x2390
        lock_acquire+0xd4/0x2d0
        down_write+0x3b/0xd0
        dcache_dir_open_wrapper+0xc1/0x1b0
        do_dentry_open+0x20c/0x510
        path_openat+0x7ad/0xc60
        do_filp_open+0xaf/0x160
        do_sys_openat2+0xab/0xe0
        __x64_sys_openat+0x6a/0xa0
        do_syscall_64+0x3a/0xc0
        entry_SYSCALL_64_after_hwframe+0x6e/0xd8

 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   rlock(eventfs_rwsem/1);
                                lock(&sb->s_type->i_mutex_key#5);
                                lock(eventfs_rwsem/1);
   lock(&sb->s_type->i_mutex_key#5);

  *** DEADLOCK ***

 1 lock held by trace-cmd/840:
  #0: ffff888103ad7e70 (eventfs_rwsem/1){.+.+}-{3:3}, at: dcache_dir_open_wrapper+0x6f/0x1b0

 stack backtrace:
 CPU: 7 PID: 840 Comm: trace-cmd Not tainted 6.5.0-rc1-test+ #576
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x57/0x90
  check_noncircular+0x14b/0x160
  __lock_acquire+0x165d/0x2390
  lock_acquire+0xd4/0x2d0
  ? dcache_dir_open_wrapper+0xc1/0x1b0
  down_write+0x3b/0xd0
  ? dcache_dir_open_wrapper+0xc1/0x1b0
  dcache_dir_open_wrapper+0xc1/0x1b0
  ? __pfx_dcache_dir_open_wrapper+0x10/0x10
  do_dentry_open+0x20c/0x510
  path_openat+0x7ad/0xc60
  do_filp_open+0xaf/0x160
  do_sys_openat2+0xab/0xe0
  __x64_sys_openat+0x6a/0xa0
  do_syscall_64+0x3a/0xc0
  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
 RIP: 0033:0x7f1743267e41
 Code: 44 24 18 31 c0 41 83 e2 40 75 3e 89 f0 25 00 00 41 00 3d 00 00 41 00 74 30 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 77 3f 48 8b 54 24 18 64 48 2b 14 25 28 00 00 00
 RSP: 002b:00007ffec10ff5d0 EFLAGS: 00000287 ORIG_RAX: 0000000000000101
 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1743267e41
 RDX: 0000000000090800 RSI: 00007ffec10ffdb0 RDI: 00000000ffffff9c
 RBP: 00007ffec10ffda0 R08: 00007ffec11003e0 R09: 0000000000000040
 R10: 0000000000000000 R11: 0000000000000287 R12: 00007ffec11003e0
 R13: 0000000000000040 R14: 0000000000000000 R15: 00007ffec110034b
  </TASK>

I moved the code around a bit, and it appears that kprobes is getting
dput() more than once.

I moved the d_invalidate() and dput() into the workqueue function, and on
kprobes going away, d_invalidate() frees it, and dput() is now corrupted.

Still investigating. The VFS layer is a magic box that needs the right
wizard hat to deal with, but I unfortunately am waiting on back order to
retrieve that specific hat :-p

-- Steve