Re: Regression: deadlock in io_schedule / nfs_writepage_locked

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Mon, 22 Aug 2022 15:01:18 +0000

On Mon, 2022-08-22 at 16:43 +0200, Igor Raits wrote:
> [You don't often get email from igor@xxxxxxxxxxxx. Learn why this is
> important at https://aka.ms/LearnAboutSenderIdentification ;]
> 
> Hello Trond,
> 
> On Mon, Aug 22, 2022 at 4:02 PM Trond Myklebust
> <trondmy@xxxxxxxxxxxxxxx> wrote:
> > 
> > On Mon, 2022-08-22 at 10:16 +0200, Igor Raits wrote:
> > > [You don't often get email from igor@xxxxxxxxxxxx. Learn why this
> > > is
> > > important at https://aka.ms/LearnAboutSenderIdentification ;]
> > > 
> > > Hello everyone,
> > > 
> > > Hopefully I'm sending this to the right place…
> > > We recently started to see the following stacktrace quite often
> > > on
> > > our
> > > VMs that are using NFS extensively (I think after upgrading to
> > > 5.18.11+, but not sure when exactly. For sure it happens on
> > > 5.18.15):
> > > 
> > > INFO: task kworker/u36:10:377691 blocked for more than 122
> > > seconds.
> > >      Tainted: G            E     5.18.15-1.gdc.el8.x86_64 #1
> > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> > > message.
> > > task:kworker/u36:10  state:D stack:    0 pid:377691 ppid:     2
> > > flags:0x00004000
> > > Workqueue: writeback wb_workfn (flush-0:308)
> > > Call Trace:
> > > <TASK>
> > > __schedule+0x38c/0x7d0
> > > schedule+0x41/0xb0
> > > io_schedule+0x12/0x40
> > > __folio_lock+0x110/0x260
> > > ? filemap_alloc_folio+0x90/0x90
> > > write_cache_pages+0x1e3/0x4d0
> > > ? nfs_writepage_locked+0x1d0/0x1d0 [nfs]
> > > nfs_writepages+0xe1/0x200 [nfs]
> > > do_writepages+0xd2/0x1b0
> > > ? check_preempt_curr+0x47/0x70
> > > ? ttwu_do_wakeup+0x17/0x180
> > > __writeback_single_inode+0x41/0x360
> > > writeback_sb_inodes+0x1f0/0x460
> > > __writeback_inodes_wb+0x5f/0xd0
> > > wb_writeback+0x235/0x2d0
> > > wb_workfn+0x348/0x4a0
> > > ? put_prev_task_fair+0x1b/0x30
> > > ? pick_next_task+0x84/0x940
> > > ? __update_idle_core+0x1b/0xb0
> > > process_one_work+0x1c5/0x390
> > > worker_thread+0x30/0x360
> > > ? process_one_work+0x390/0x390
> > > kthread+0xd7/0x100
> > > ? kthread_complete_and_exit+0x20/0x20
> > > ret_from_fork+0x1f/0x30
> > > </TASK>
> > > 
> > > I see that something very similar was fixed in btrfs
> > > (
> > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commi
> > > t/?h=linux-5.18.y&id=9535ec371d741fa037e37eddc0a5b25ba82d0027)
> > > but I could not find anything similar for NFS.
> > > 
> > > Do you happen to know if this is already fixed? If so, would you
> > > mind
> > > sharing some commits? If not, could you help getting this
> > > addressed?
> > > 
> > 
> > The stack trace you show above isn't particularly helpful for
> > diagnosing what the problem is.
> > 
> > All it is saying is that 'thread A' is waiting to take a page lock
> > that
> > is being held by a different 'thread B'. Without information on
> > what
> > 'thread B' is doing, and why it isn't releasing the lock, there is
> > nothing we can conclude.
> 
> Do you have some hint how to debug this issue further (when it
> happens
> again)? Would `virsh dump` to get a memory dump and then some kind of
> "bt all" via crash help to get more information?
> Or something else?
> 
> Thanks in advance!
> --
> Igor Raits

Please try running the following two lines of 'bash' script as root:

(for tt in $(grep -l 'nfs[^d]' /proc/*/stack); do echo "${tt}:"; cat ${tt}; echo; done) >/tmp/nfs_threads.txt

cat /sys/kernel/debug/sunrpc/rpc_clnt/*/tasks > /tmp/rpc_tasks.txt

and then send us the output from the two files /tmp/nfs_threads.txt and
/tmp/rpc_tasks.txt.

The file nfs_threads.txt gives us a full set of stack traces from all
processes that are currently in the NFS client code. So it should
contain both the stack trace from your 'thread A' above, and the traces
from all candidates for the 'thread B' process that is causing the
blockage.
The file rpc_tasks.txt gives us the status of any RPC calls that might
be outstanding and might help diagnose any issues with the TCP
connection.

That should therefore give us a better starting point for root causing
the problem.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx