User process NFS write hang in wait_on_commit with kworker

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On May 20th I reported "User process NFS write hang followed
by automount hang requiring reboot" to this list.  There I
had a process that would hang on NFS write, followed by sync
hanging, eventually leading to my need to reboot the host.

On June 4th, after upgrading to Linux 4.19.44, I reported
the issue resolved.  Since that time, as I've deployed out
Linux 4.19.44, the issue has come back--sort of.

I have begun once again getting sync hangs following a
hung NFS write.  The hung write has a different stack trace
than any I previously reported:

    [<0>] wait_on_commit+0x60/0x90 [nfs]
    [<0>] __nfs_commit_inode+0x146/0x1a0 [nfs]
    [<0>] nfs_file_fsync+0xa7/0x1d0 [nfs]
    [<0>] filp_close+0x25/0x70
    [<0>] put_files_struct+0x66/0xb0
    [<0>] do_exit+0x2af/0xbb0
    [<0>] do_group_exit+0x35/0xa0
    [<0>] __x64_sys_exit_group+0xf/0x10
    [<0>] do_syscall_64+0x45/0x100
    [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [<0>] 0xffffffffffffffff

And there is attendant kworker thread:

    [<0>] wait_on_commit+0x60/0x90 [nfs]
    [<0>] __nfs_commit_inode+0x146/0x1a0 [nfs]
    [<0>] nfs_write_inode+0x5c/0x90 [nfs]
    [<0>] nfs4_write_inode+0xd/0x30 [nfsv4]
    [<0>] __writeback_single_inode+0x27a/0x320
    [<0>] writeback_sb_inodes+0x19a/0x460
    [<0>] wb_writeback+0x102/0x2f0
    [<0>] wb_workfn+0xa3/0x400
    [<0>] process_one_work+0x1e3/0x3d0
    [<0>] worker_thread+0x28/0x3c0
    [<0>] kthread+0x10e/0x130
    [<0>] ret_from_fork+0x35/0x40
    [<0>] 0xffffffffffffffff

Oddly enough, I can clear the problem without rebooting the host.
I arrange to block all traffic between the NFS server and NFS
client using iptables, of sufficient time for any open TCP
connections to timeout.  After which the connection apparently
reestablishes and unblocks the hung process.

I can't explain what's keeping the connection alive but apparently
stalled--requiring my manual intervention.  Do any of you have
ideas or speculation?  I'm happy to poke around in a packet capture
if the information provided isn't sufficient.

-A
-- 
Alan Post | Xen VPS hosting for the technically adept
PO Box 61688 | Sunnyvale, CA 94088-1681 | https://prgmr.com/
email: adp@xxxxxxxxx



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux