Re: steam-associated reproducible hard NFSv4.2 client hang (5.9, 5.10)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



(I can't get References: right on this mail due to the original aging
out of my mailbox: archive URL,
https://www.spinics.net/lists/linux-nfs/msg81430.html).

I now have a little lockdep info from this hang (and reports from at
least two others that they've seen similar-looking hangs dating back to
4.19, though much harder to reproduce, taking many hours rather than
five minutes: in one case they report not using NFS in production any
more because of this).

Unfortunately the lockdep info isn't much use:

Feb 13 14:13:12 silk warning: : [  888.834464] Showing all locks held in the system:
Feb 13 14:13:12 silk warning: : [  888.834501] 1 lock held by dmesg/1152:
Feb 13 14:13:12 silk warning: : [  888.834508]  #0: ffff980c3b7200d0 (&user->lock){+.+.}-{3:3}, at: devkmsg_read+0x49/0x2d1
Feb 13 14:13:12 silk warning: : [  888.834540] 2 locks held by tee/1322:
Feb 13 14:13:12 silk warning: : [  888.834546]  #0: ffff980c0809a430 (sb_writers#12){.+.+}-{0:0}, at: ksys_write+0x6a/0xdc
Feb 13 14:13:12 silk warning: : [  888.834573]  #1: ffff980c3ca7b5e8 (&sb->s_type->i_mutex_key#16){++++}-{3:3}, at: nfs_start_io_write+0x1a/0x45
Feb 13 14:13:12 silk warning: : [  888.834632] 1 lock held by 192.168.16.8-ma/2302:
Feb 13 14:13:12 silk warning: : [  888.834638]  #0: ffff980c0fe6b700 (&acct->lock#2){+.+.}-{3:3}, at: acct_process+0x102/0x2bc

The first of those is my ongoing dmesg -w. The last is process
accounting. The middle one is an ongoing, always-active Xsession-errors
tee over the same NFSv4 connection, which says nothing more than that
writes to this NFS server from this client have hung, which we already
know. There are no signs of locks held by the Steam client which has
hung in the middle of installation.

So whateverthehell this is, it's not blocked on a lock. The NFS client
is hanging all on its own. (I have no idea how clients can block in the
middle of writing if a lock is *not* involved somehow, but that is what
it looks like from the lockdep output.)

Does anyone know how I might start debugging this sod?



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux