Re: blk-mq hangs easily with LLVM+clang test suite

Bart Van Assche <Bart.VanAssche@xxxxxxx> · Mon, 22 Jan 2018 17:25:20 +0000

On Sun, 2018-01-21 at 12:39 -0500, David Zarzycki wrote:
> Hi Bart,
> 
> I can do [1] and [2] but I won’t be able to provide the command output
> because the hang is almost total. No new commands can run because the
> scheduler is hung. For example, see this backtrace:
> 
> http://znu.io/IMG_0362.jpg
> 
> Is there another approach I can use?

Hello Dave,

What I proposed is an approach to find the cause of a block layer queue that
got stuck due to a missing queue run. However, the screenshot that you shared
in your e-mail shows that something else is going on. If the message
"rcu_sched detected stalls" then that means that something prevented the RCU
code to be scheduled, e.g. because interrupts occur at a frequency that is so
high that the RCU code doesn't get a chance to run. For more information, see
also https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt.

I think you will need the help of an NVMe expert to analyze this further. If
you would not be aware of this, there is a mailing list that is dedicated to
NVMe. See also https://lists.infradead.org/mailman/listinfo/linux-nvme.

Bart.