On 2/20/25 9:18 AM, Masami Hiramatsu (Google) wrote:
On Wed, 19 Feb 2025 15:20:39 -0500
Waiman Long <llong@xxxxxxxxxx> wrote:
On 2/19/25 10:02 AM, Lance Yang wrote:
On Wed, Feb 19, 2025 at 9:33 PM Lance Yang <ioworker0@xxxxxxxxx> wrote:
CC linux-mm
On Wed, Feb 19, 2025 at 9:00 PM Masami Hiramatsu (Google)
<mhiramat@xxxxxxxxxx> wrote:
Hi,
The hung_task detector is very useful for detecting the lockup.
However, since it only dumps the blocked (uninterruptible sleep)
processes, it is not enough to identify the root cause of that
lockup.
For example, if a process holds a mutex and sleep an event in
interruptible state long time, the other processes will wait on
the mutex in uninterruptible state. In this case, the waiter
processes are dumped, but the blocker process is not shown
because it is sleep in interruptible state.
Cool! I just ran into something similar today, but with rwsem. In that
case, the blocked process was locked up, and we could not identify
the root cause either ;(
Once this patch series is settled down, we can extend rwsem to provide
similar feature.
While discussing about rwsem with Sergey, he pointed that we can not
identify a single blocker on rwsem, because several readers can block
several writers. In this case, we need to dump all of them but we
don't have such info.
So anyway, I would like to start from mutex, which is the simplest one.
For the other locks, we will discuss later. (or start with limited
support, like showing only rwsem::owner)
Yes, reader tracking is a problem as the rw_semaphore structure doesn't
store information about the reader-owners as the count can vary. That is
a limitation that we have to live with.
Cheers,
Longman