CC linux-mm On Wed, Feb 19, 2025 at 9:00 PM Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx> wrote: > > Hi, > > The hung_task detector is very useful for detecting the lockup. > However, since it only dumps the blocked (uninterruptible sleep) > processes, it is not enough to identify the root cause of that > lockup. > > For example, if a process holds a mutex and sleep an event in > interruptible state long time, the other processes will wait on > the mutex in uninterruptible state. In this case, the waiter > processes are dumped, but the blocker process is not shown > because it is sleep in interruptible state. > > This adds a feature to dump the blocker task which holds a mutex > when detecting a hung task. e.g. > > INFO: task cat:113 blocked for more than 122 seconds. > Not tainted 6.14.0-rc3-00002-g6afe972e1b9b #152 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > task:cat state:D stack:13432 pid:113 tgid:113 ppid:103 task_flags:0x400100 flags:0x00000002 > Call Trace: > <TASK> > __schedule+0x731/0x960 > ? schedule_preempt_disabled+0x54/0xa0 > schedule+0xb7/0x140 > ? __mutex_lock+0x51d/0xa50 > ? __mutex_lock+0x51d/0xa50 > schedule_preempt_disabled+0x54/0xa0 > __mutex_lock+0x51d/0xa50 > ? current_time+0x3a/0x120 > read_dummy+0x23/0x70 > full_proxy_read+0x6a/0xc0 > vfs_read+0xc2/0x340 > ? __pfx_direct_file_splice_eof+0x10/0x10 > ? do_sendfile+0x1bd/0x2e0 > ksys_read+0x76/0xe0 > do_syscall_64+0xe3/0x1c0 > ? exc_page_fault+0xa9/0x1d0 > entry_SYSCALL_64_after_hwframe+0x77/0x7f > RIP: 0033:0x4840cd > RSP: 002b:00007ffe632b76c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 > RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004840cd > RDX: 0000000000001000 RSI: 00007ffe632b7710 RDI: 0000000000000003 > RBP: 00007ffe632b7710 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000001000 > R13: 000000003a8b63a0 R14: 0000000000000001 R15: ffffffffffffffff > </TASK> > INFO: task cat:113 is blocked on a mutex owned by task cat:112. > task:cat state:S stack:13432 pid:112 tgid:112 ppid:103 task_flags:0x400100 flags:0x00000002 > Call Trace: > <TASK> > __schedule+0x731/0x960 > ? schedule_timeout+0xa8/0x120 > schedule+0xb7/0x140 > schedule_timeout+0xa8/0x120 > ? __pfx_process_timeout+0x10/0x10 > msleep_interruptible+0x3e/0x60 > read_dummy+0x2d/0x70 > full_proxy_read+0x6a/0xc0 > vfs_read+0xc2/0x340 > ? __pfx_direct_file_splice_eof+0x10/0x10 > ? do_sendfile+0x1bd/0x2e0 > ksys_read+0x76/0xe0 > do_syscall_64+0xe3/0x1c0 > ? exc_page_fault+0xa9/0x1d0 > entry_SYSCALL_64_after_hwframe+0x77/0x7f > RIP: 0033:0x4840cd > RSP: 002b:00007ffd69513748 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 > RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004840cd > RDX: 0000000000001000 RSI: 00007ffd69513790 RDI: 0000000000000003 > RBP: 00007ffd69513790 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000001000 > R13: 0000000029d8d3a0 R14: 0000000000000001 R15: ffffffffffffffff > </TASK> > > Thank you, > > --- > > Masami Hiramatsu (Google) (2): > hung_task: Show the blocker task if the task is hung on mutex > samples: Add hung_task detector mutex blocking sample > > > kernel/hung_task.c | 38 ++++++++++++++++++++ > kernel/locking/mutex-debug.c | 1 + > kernel/locking/mutex.c | 9 +++++ > kernel/locking/mutex.h | 6 +++ > samples/Kconfig | 9 +++++ > samples/Makefile | 1 + > samples/hung_task/Makefile | 2 + > samples/hung_task/hung_task_mutex.c | 66 +++++++++++++++++++++++++++++++++++ > 8 files changed, 132 insertions(+) > create mode 100644 samples/hung_task/Makefile > create mode 100644 samples/hung_task/hung_task_mutex.c > > -- > Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>