On Tue, 2021-02-23 at 17:57 -0500, J. Bruce Fields wrote: > On Sun, Jan 03, 2021 at 02:27:51PM +0000, Nick Alcock wrote: > > So I bought a game or two on Steam a fortnight ago to while away a > > Christmas spent stuck indoors with no family, decorations, or > > presents > > (stuck at my sister's and she got a perfectly-timed case of covid- > > 19). > > It turned out this was a very bad idea! > > > > I keep my Steam directory on an ext4 fs on the local fileserver and > > access it via NFSv4.2: over 10GbE: it's actually faster than local > > spinning-rust storage would be (five-spindle RAID-6 and bcache and > > 128GiB RAM on the server for caching probably have something to do > > with > > that!) though likely slower than the NVMe the local machine > > actually > > has: but it doesn't have enough local storage, so I keep all the > > Steam > > stuff remotely. > > > > (I keep my $HOME on the same fileserver, though that's on xfs: I'd > > use > > xfs for Steam, too, but there are still a lot of games out there > > that > > fail on filesystems with 64-bit inode numbers.) > > > > Unfortunately, what I observe now when kicking up Steam is a spasm > > of > > downloading I/O for the game I just bought and then a complete > > client-side hang when it starts installing: just about all I can do > > is > > hard-reboot. This of course stops me using Steam for anything which > > didn't make me very happy :P > > > > The server hums onwards as though nothing went wrong, but the > > client is > > in a bad way indeed: not only has NFS hung but there are clearly > > some > > important locks taken out since even an ls on a non-NFS dir (with > > no NFS > > dirs in the PATH) hangs, as does a while :; ps -e; done loop. I can > > rebuild with lockdep and do a lock dump if that would be useful. > > (The > > kernel is not quite stock: it has DTrace built in, but the hangs > > happen > > even if the kernel module has never been loaded, and I'm pretty > > sure the > > hangs can't be associated with the not-very-substantial built-in > > DTrace > > changes. I can try rebuilding without that as well, but I'll have > > to rip > > chunks out of my monitoring machinery to do that so I've avoided > > doing > > it so far.) > > > > I finally got round to doing a bit of debugging today. NFS > > debugging > > dumps covering the last fifteen seconds or so before the hang and a > > couple of minutes afterwards (taken on the client, logged without > > incident to the server's syslog, as proof that not *everything* is > > stalled) are a few megabytes long so are at > > <http://www.esperi.org.uk/~nix/temporary/nfs-steam-hang.log>. > > Everything > > seems to go wrong after an especially large bunch of NFS commits. > > > > Both client (silk, 192.168.16.21) and server (loom, 192.168.16.8) > > are > > running 5.10.4 with the same NFS config, though the client has no > > NFS > > server exports at the time and the server isn't acting as an NFS > > client > > at all: > > > > CONFIG_NFS_FS=y > > # CONFIG_NFS_V2 is not set > > CONFIG_NFS_V3=y > > CONFIG_NFS_V3_ACL=y > > CONFIG_NFS_V4=y > > # CONFIG_NFS_SWAP is not set > > CONFIG_NFS_V4_1=y > > CONFIG_NFS_V4_2=y > > CONFIG_PNFS_FILE_LAYOUT=y > > CONFIG_PNFS_BLOCK=y > > CONFIG_PNFS_FLEXFILE_LAYOUT=m > > CONFIG_NFS_V4_1_IMPLEMENTATION_ID_DOMAIN="kernel.org" > > # CONFIG_NFS_V4_1_MIGRATION is not set > > # CONFIG_ROOT_NFS is not set > > # CONFIG_NFS_USE_LEGACY_DNS is not set > > CONFIG_NFS_USE_KERNEL_DNS=y > > CONFIG_NFS_DEBUG=y > > CONFIG_NFS_DISABLE_UDP_SUPPORT=y > > # CONFIG_NFS_V4_2_READ_PLUS is not set > > CONFIG_NFSD=y > > CONFIG_NFSD_V2_ACL=y > > CONFIG_NFSD_V3=y > > CONFIG_NFSD_V3_ACL=y > > CONFIG_NFSD_V4=y > > # CONFIG_NFSD_BLOCKLAYOUT is not set > > # CONFIG_NFSD_SCSILAYOUT is not set > > # CONFIG_NFSD_FLEXFILELAYOUT is not set > > CONFIG_NFSD_V4_2_INTER_SSC=y > > CONFIG_GRACE_PERIOD=y > > CONFIG_LOCKD=y > > CONFIG_LOCKD_V4=y > > CONFIG_NFS_ACL_SUPPORT=y > > CONFIG_NFS_COMMON=y > > CONFIG_SUNRPC=y > > CONFIG_SUNRPC_GSS=y > > CONFIG_SUNRPC_BACKCHANNEL=y > > CONFIG_SUNRPC_DEBUG=y > > > > The server (loom) has a large number of exports: many of the > > filesystems > > being exported are moved around via bind mounts and the like. The > > steam > > datadir is in /pkg/non-free/steam, which is bind-mounted onto > > /home/nix/.steam on the client: /home/nix itself is bind-mounted > > from > > /home/.loom.srvr.nix/steam, and is NFS-exported from the server > > under > > that pathname. > > > > Relevant exports, from /proc/fs/nfs/exports: > > > > / 192.168.16.0/24,silk.wkstn.nix(ro,insecure,no_root_squash,s > > ync,no_wdelay,no_subtree_check,v4root,fsid=0,uuid=0a4a4563:00764033 > > :8c827c0e:989cf534,sec=390003:390004:390005:1) > > /home/.loom.srvr.nix *.srvr.nix,fold.srvr.nix(rw,root_squash,syn > > c,wdelay,no_subtree_check,uuid=0a4a4563:00764033:8c827c0e:989cf534, > > sec=1) Isn't that trying to export the same filesystem as '/' on the line above using conflicting export options? > > /pkg/non- > > free/steam 192.168.16.0/24,silk.wkstn.nix(rw,root_squash,sync,w > > delay,uuid=11b94016:cb6e4b64:beb6c134:9ea3d6b3,sec=1) > > > > Blocked tasks after the hang suggest that the client is hung > > waiting > > indefinitely for the server to reply, and I'd guess the server > > doesn't > > think it needs to. (I can do some server-side debugging dumps too > > if you > > think the problem might be there.) > > I seem to remember there have been some subtle changes to how dropped > replies are handled. If I remember right: a v4 server is never > supposed > to drop a reply without also closing the tcp connection, but it may > have > done so in the past. (I *think* all those cases are fixed?) > > And the client may have been changed to be more unforgiving about > that > requirement in recent years. > > But I don't know if that's where the issue is. > > I'm not sure how best to go about debugging. I suppose you could try > to > capture a network trace and work out if there's an RPC that isn't > getting a reply. > > Maybe turning on server debugging and looking for "svc: svc_process > dropit" messages timed with the start of the hang. I think what you > want is "rpdebug -m rpc -s svcdsp". > > Does rpc.mountd look healthy? > > --b. > > > > > > [ 322.843962] sysrq: Show Blocked State > > [ 322.844058] task:tee state:D stack: 0 pid: 1176 > > ppid: 1173 flags:0x00000000 > > [ 322.844064] Call Trace: > > [ 322.844074] __schedule+0x3db/0x7ee > > [ 322.844079] ? io_schedule+0x70/0x6d > > [ 322.844082] schedule+0x68/0xdc > > [ 322.844085] io_schedule+0x46/0x6d > > [ 322.844089] bit_wait_io+0x11/0x52 > > [ 322.844092] __wait_on_bit+0x31/0x90 > > [ 322.844096] out_of_line_wait_on_bit+0x7e/0x80 > > [ 322.844100] ? var_wake_function+0x30/0x27 > > [ 322.844104] nfs_wait_on_request.part.0+0x31/0x33 > > [ 322.844107] nfs_page_group_lock_head+0x31/0x78 > > [ 322.844110] nfs_lock_and_join_requests+0x9d/0x26e > > [ 322.844114] nfs_updatepage+0x1be/0x985 > > [ 322.844118] nfs_write_end+0x128/0x52a > > [ 322.844122] ? iov_iter_copy_from_user_atomic+0xe8/0x3ce > > [ 322.844127] generic_perform_write+0xed/0x17c > > [ 322.844131] nfs_file_write+0x14a/0x29f > > [ 322.844135] new_sync_write+0xfb/0x16b > > [ 322.844139] vfs_write+0x1ee/0x29a > > [ 322.844142] ? vfs_read+0x17b/0x198 > > [ 322.844145] ksys_write+0x61/0xd0 > > [ 322.844148] __x64_sys_write+0x1a/0x1c > > [ 322.844151] do_syscall_64+0x32/0x45 > > [ 322.844154] entry_SYSCALL_64_after_hwframe+0x44/0x0 > > [ 322.844158] RIP: 0033:0x7f4781f323a3 > > [ 322.844160] RSP: 002b:00007ffcf8220408 EFLAGS: 00000246 > > ORIG_RAX: 0000000000000001 > > [ 322.844164] RAX: ffffffffffffffda RBX: 0000000000000092 RCX: > > 00007f4781f323a3 > > [ 322.844166] RDX: 0000000000000092 RSI: 00007ffcf8220530 RDI: > > 0000000000000003 > > [ 322.844168] RBP: 00007ffcf8220530 R08: 0000000000000000 R09: > > 0000000000000001 > > [ 322.844170] R10: 0000000000400ad2 R11: 0000000000000246 R12: > > 0000000000000092 > > [ 322.844172] R13: 0000000002361580 R14: 0000000000000092 R15: > > 00007f4782201720 > > [ 322.844202] task:CIPCServer::Thr state:D stack: 0 pid: 1801 > > ppid: 1540 flags:0x20020000 > > [ 322.844207] Call Trace: > > [ 322.844212] __schedule+0x3db/0x7ee > > [ 322.844219] ? > > trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > > [ 322.844222] schedule+0x68/0xdc > > [ 322.844227] rpc_wait_bit_killable+0x11/0x52 > > [ 322.844230] __wait_on_bit+0x31/0x90 > > [ 322.844235] ? __rpc_atrun+0x20/0x1a > > [ 322.844238] ? __rpc_atrun+0x20/0x1a > > [ 322.844242] out_of_line_wait_on_bit+0x7e/0x80 > > [ 322.844246] ? var_wake_function+0x30/0x27 > > [ 322.844250] __rpc_execute+0xe5/0x485 > > [ 322.844254] ? sugov_get_util+0xf0/0xeb > > [ 322.844258] rpc_execute+0xa5/0xaa > > [ 322.844263] rpc_run_task+0x148/0x187 > > [ 322.844267] nfs4_do_call_sync+0x61/0x81 > > [ 322.844272] _nfs4_proc_getattr+0x10a/0x120 > > [ 322.844278] nfs4_proc_getattr+0x67/0xee > > [ 322.844283] __nfs_revalidate_inode+0xb7/0x2df > > [ 322.844287] nfs_access_get_cached+0x14d/0x1df > > [ 322.844291] nfs_do_access+0x48/0x28e > > [ 322.844296] ? step_into+0x2bc/0x6a1 > > [ 322.844301] nfs_permission+0xa2/0x1b8 > > [ 322.844305] inode_permission+0x96/0xca > > [ 322.844308] ? unlazy_walk+0x4c/0x80 > > [ 322.844312] link_path_walk.part.0+0x2a5/0x320 > > [ 322.844315] ? path_init+0x2c2/0x3e5 > > [ 322.844318] path_lookupat+0x3f/0x1b1 > > [ 322.844322] filename_lookup+0x97/0x17f > > [ 322.844326] ? kmem_cache_alloc+0x32/0x200 > > [ 322.844331] user_path_at_empty+0x59/0x8a > > [ 322.844333] do_faccessat+0x70/0x227 > > [ 322.844337] __ia32_sys_access+0x1c/0x1e > > [ 322.844340] __do_fast_syscall_32+0x5f/0x8e > > [ 322.844342] do_fast_syscall_32+0x3d/0x80 > > [ 322.844346] entry_SYSCALL_compat_after_hwframe+0x45/0x0 > > [ 322.844348] RIP: 0023:0xf7ee6549 > > [ 322.844350] RSP: 002b:00000000eb763b28 EFLAGS: 00200292 > > ORIG_RAX: 0000000000000021 > > [ 322.844353] RAX: ffffffffffffffda RBX: 000000005754d9e0 RCX: > > 0000000000000004 > > [ 322.844355] RDX: 00000000f7ab3e1c RSI: 000000005754d9e0 RDI: > > 00000000eb763bc8 > > [ 322.844357] RBP: 00000000f78a91c0 R08: 0000000000000000 R09: > > 0000000000000000 > > [ 322.844359] R10: 0000000000000000 R11: 0000000000000000 R12: > > 0000000000000000 > > [ 322.844360] R13: 0000000000000000 R14: 0000000000000000 R15: > > 0000000000000000 > > [ 322.844364] task:CFileWriterThre state:D stack: 0 pid: 1810 > > ppid: 1540 flags:0x20024000 > > [ 322.844367] Call Trace: > > [ 322.844371] __schedule+0x3db/0x7ee > > [ 322.844375] ? > > trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > > [ 322.844379] schedule+0x68/0xdc > > [ 322.844382] rpc_wait_bit_killable+0x11/0x52 > > [ 322.844385] __wait_on_bit+0x31/0x90 > > [ 322.844389] ? call_reserveresult+0xa0/0x9f > > [ 322.844391] ? call_transmit_status+0x160/0x159 > > [ 322.844395] out_of_line_wait_on_bit+0x7e/0x80 > > [ 322.844398] ? var_wake_function+0x30/0x27 > > [ 322.844401] __rpc_execute+0xe5/0x485 > > [ 322.844404] ? nfs_check_verifier+0x5b/0xa4 > > [ 322.844407] rpc_execute+0xa5/0xaa > > [ 322.844410] rpc_run_task+0x148/0x187 > > [ 322.844414] nfs4_do_call_sync+0x61/0x81 > > [ 322.844417] _nfs4_proc_getattr+0x10a/0x120 > > [ 322.844422] nfs4_proc_getattr+0x67/0xee > > [ 322.844426] __nfs_revalidate_inode+0xb7/0x2df > > [ 322.844430] nfs_getattr+0x159/0x374 > > [ 322.844433] vfs_getattr_nosec+0x96/0xa4 > > [ 322.844435] vfs_statx+0x7a/0xe5 > > [ 322.844438] vfs_fstatat+0x1f/0x21 > > [ 322.844443] __do_compat_sys_ia32_lstat64+0x2b/0x4e > > [ 322.844448] __ia32_compat_sys_ia32_lstat64+0x14/0x16 > > [ 322.844450] __do_fast_syscall_32+0x5f/0x8e > > [ 322.844453] do_fast_syscall_32+0x3d/0x80 > > [ 322.844457] entry_SYSCALL_compat_after_hwframe+0x45/0x0 > > [ 322.844459] RIP: 0023:0xf7ee6549 > > [ 322.844460] RSP: 002b:00000000eb561be8 EFLAGS: 00000292 > > ORIG_RAX: 00000000000000c4 > > [ 322.844463] RAX: ffffffffffffffda RBX: 00000000c521bf60 RCX: > > 00000000eb561e70 > > [ 322.844465] RDX: 00000000f7ab3e1c RSI: 00000000eb561e70 RDI: > > 00000000c521bf60 > > [ 322.844467] RBP: 0000000000000000 R08: 0000000000000000 R09: > > 0000000000000000 > > [ 322.844468] R10: 0000000000000000 R11: 0000000000000000 R12: > > 0000000000000000 > > [ 322.844470] R13: 0000000000000000 R14: 0000000000000000 R15: > > 0000000000000000 > > [ 322.844477] task:Steam - DirWatc state:D stack: 0 pid: 1924 > > ppid: 1540 flags:0x20024000 > > [ 322.844480] Call Trace: > > [ 322.844484] __schedule+0x3db/0x7ee > > [ 322.844488] ? > > trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > > [ 322.844492] schedule+0x68/0xdc > > [ 322.844495] rpc_wait_bit_killable+0x11/0x52 > > [ 322.844498] __wait_on_bit+0x31/0x90 > > [ 322.844501] ? rpc_sleep_on_timeout+0x64/0x9c > > [ 322.844504] ? rpc_check_timeout+0x1e0/0x1e0 > > [ 322.844507] ? call_connect_status+0x180/0x17a > > [ 322.844511] out_of_line_wait_on_bit+0x7e/0x80 > > [ 322.844513] ? var_wake_function+0x30/0x27 > > [ 322.844516] __rpc_execute+0xe5/0x485 > > [ 322.844520] rpc_execute+0xa5/0xaa > > [ 322.844523] rpc_run_task+0x148/0x187 > > [ 322.844526] nfs4_do_call_sync+0x61/0x81 > > [ 322.844530] _nfs4_proc_getattr+0x10a/0x120 > > [ 322.844534] nfs4_proc_getattr+0x67/0xee > > [ 322.844538] __nfs_revalidate_inode+0xb7/0x2df > > [ 322.844542] nfs_lookup_verify_inode+0x67/0x82 > > [ 322.844545] nfs_do_lookup_revalidate+0x19b/0x29c > > [ 322.844548] nfs4_do_lookup_revalidate+0x5d/0xc5 > > [ 322.844552] nfs4_lookup_revalidate+0x6a/0x8d > > [ 322.844555] lookup_fast+0xb8/0x122 > > [ 322.844559] path_openat+0xfd/0xfd8 > > [ 322.844562] ? unlazy_walk+0x4c/0x80 > > [ 322.844565] ? terminate_walk+0x62/0xed > > [ 322.844569] ? putname+0x4b/0x53 > > [ 322.844572] do_filp_open+0x86/0x110 > > [ 322.844576] ? kmem_cache_alloc+0x32/0x200 > > [ 322.844579] ? __alloc_fd+0xb2/0x143 > > [ 322.844583] do_sys_openat2+0x8d/0x13a > > [ 322.844585] ? do_faccessat+0x108/0x227 > > [ 322.844588] __ia32_compat_sys_openat+0x48/0x63 > > [ 322.844591] __do_fast_syscall_32+0x5f/0x8e > > [ 322.844594] do_fast_syscall_32+0x3d/0x80 > > [ 322.844597] entry_SYSCALL_compat_after_hwframe+0x45/0x0 > > [ 322.844600] RIP: 0023:0xf7ee6549 > > [ 322.844601] RSP: 002b:000000008c8fbcb0 EFLAGS: 00200287 > > ORIG_RAX: 0000000000000127 > > [ 322.844604] RAX: ffffffffffffffda RBX: 00000000ffffff9c RCX: > > 000000008c8fbffc > > [ 322.844606] RDX: 0000000000098800 RSI: 0000000000000000 RDI: > > 00000000f7ab3e1c > > [ 322.844608] RBP: 0000000000000000 R08: 0000000000000000 R09: > > 0000000000000000 > > [ 322.844609] R10: 0000000000000000 R11: 0000000000000000 R12: > > 0000000000000000 > > [ 322.844611] R13: 0000000000000000 R14: 0000000000000000 R15: > > 0000000000000000 > > [ 322.844614] task:CGenericAsyncFi state:D stack: 0 pid: 1927 > > ppid: 1540 flags:0x20020000 > > [ 322.844617] Call Trace: > > [ 322.844621] __schedule+0x3db/0x7ee > > [ 322.844625] schedule+0x68/0xdc > > [ 322.844628] io_schedule+0x46/0x6d > > [ 322.844631] wait_on_page_bit_common+0xf8/0x325 > > [ 322.844634] ? > > trace_raw_output_file_check_and_advance_wb_err+0x80/0x7d > > [ 322.844638] wait_on_page_bit+0x3f/0x41 > > [ 322.844642] wait_on_page_writeback.part.0+0x35/0x80 > > [ 322.844645] wait_on_page_writeback+0x27/0x29 > > [ 322.844647] __filemap_fdatawait_range+0x88/0xd9 > > [ 322.844651] filemap_write_and_wait_range+0x45/0x80 > > [ 322.844655] nfs_wb_all+0x30/0x13d > > [ 322.844658] nfs4_file_flush+0x85/0xc0 > > [ 322.844661] filp_close+0x3e/0x78 > > [ 322.844664] __close_fd+0x23/0x30 > > [ 322.844667] __ia32_sys_close+0x22/0x50 > > [ 322.844670] __do_fast_syscall_32+0x5f/0x8e > > [ 322.844673] do_fast_syscall_32+0x3d/0x80 > > [ 322.844676] entry_SYSCALL_compat_after_hwframe+0x45/0x0 > > [ 322.844678] RIP: 0023:0xf7ee6549 > > [ 322.844680] RSP: 002b:000000008c5fddb0 EFLAGS: 00200286 > > ORIG_RAX: 0000000000000006 > > [ 322.844682] RAX: ffffffffffffffda RBX: 000000000000004d RCX: > > 0000000000000002 > > [ 322.844684] RDX: 0000000000000000 RSI: 00000000f7d02e70 RDI: > > 000000000000004d > > [ 322.844686] RBP: 000000008c5fe110 R08: 0000000000000000 R09: > > 0000000000000000 > > [ 322.844688] R10: 0000000000000000 R11: 0000000000000000 R12: > > 0000000000000000 > > [ 322.844689] R13: 0000000000000000 R14: 0000000000000000 R15: > > 0000000000000000 > > [ 322.844699] task:ThreadPoolForeg state:D stack: 0 pid: 1768 > > ppid: 1735 flags:0x00004000 > > [ 322.844702] Call Trace: > > [ 322.844705] __schedule+0x3db/0x7ee > > [ 322.844709] ? > > trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > > [ 322.844712] schedule+0x68/0xdc > > [ 322.844716] rpc_wait_bit_killable+0x11/0x52 > > [ 322.844719] __wait_on_bit+0x31/0x90 > > [ 322.844722] ? rpc_sleep_on_timeout+0x64/0x9c > > [ 322.844725] ? rpc_check_timeout+0x1e0/0x1e0 > > [ 322.844728] ? call_connect_status+0x180/0x17a > > [ 322.844731] out_of_line_wait_on_bit+0x7e/0x80 > > [ 322.844734] ? var_wake_function+0x30/0x27 > > [ 322.844737] __rpc_execute+0xe5/0x485 > > [ 322.844740] ? nfs_check_verifier+0x5b/0xa4 > > [ 322.844743] rpc_execute+0xa5/0xaa > > [ 322.844746] rpc_run_task+0x148/0x187 > > [ 322.844750] nfs4_do_call_sync+0x61/0x81 > > [ 322.844753] _nfs4_proc_getattr+0x10a/0x120 > > [ 322.844758] nfs4_proc_getattr+0x67/0xee > > [ 322.844762] __nfs_revalidate_inode+0xb7/0x2df > > [ 322.844766] nfs_getattr+0x159/0x374 > > [ 322.844769] vfs_getattr_nosec+0x96/0xa4 > > [ 322.844771] vfs_statx+0x7a/0xe5 > > [ 322.844774] __do_sys_newstat+0x31/0x4c > > [ 322.844778] __x64_sys_newstat+0x16/0x18 > > [ 322.844781] do_syscall_64+0x32/0x45 > > [ 322.844783] entry_SYSCALL_64_after_hwframe+0x44/0x0 > > [ 322.844786] RIP: 0033:0x7fe70ebf19e6 > > [ 322.844787] RSP: 002b:00007fe6c15b2068 EFLAGS: 00000246 > > ORIG_RAX: 0000000000000004 > > [ 322.844790] RAX: ffffffffffffffda RBX: 00000bd4193e1910 RCX: > > 00007fe70ebf19e6 > > [ 322.844792] RDX: 00007fe6c15b2078 RSI: 00007fe6c15b2078 RDI: > > 00000bd4193e1910 > > [ 322.844794] RBP: 00007fe6c15b2160 R08: 0000000000000001 R09: > > 00007ffd971ed090 > > [ 322.844796] R10: 0000000000056764 R11: 0000000000000246 R12: > > 00007fe6c15b2078 > > [ 322.844797] R13: 00000bd4192f6f60 R14: 00007fe6c15b2108 R15: > > 00007fe6c15b2120 > > [ 322.844801] task:ThreadPoolForeg state:D stack: 0 pid: 1769 > > ppid: 1735 flags:0x00000000 > > [ 322.844804] Call Trace: > > [ 322.844807] __schedule+0x3db/0x7ee > > [ 322.844811] ? > > trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > > [ 322.844814] schedule+0x68/0xdc > > [ 322.844817] rpc_wait_bit_killable+0x11/0x52 > > [ 322.844820] __wait_on_bit+0x31/0x90 > > [ 322.844823] ? __rpc_atrun+0x20/0x1a > > [ 322.844826] ? __rpc_atrun+0x20/0x1a > > [ 322.844829] out_of_line_wait_on_bit+0x7e/0x80 > > [ 322.844831] ? var_wake_function+0x30/0x27 > > [ 322.844834] __rpc_execute+0xe5/0x485 > > [ 322.844837] ? update_cfs_group+0x9a/0x9e > > [ 322.844840] rpc_execute+0xa5/0xaa > > [ 322.844842] rpc_run_task+0x148/0x187 > > [ 322.844845] nfs4_do_call_sync+0x61/0x81 > > [ 322.844849] _nfs4_proc_getattr+0x10a/0x120 > > [ 322.844853] nfs4_proc_getattr+0x67/0xee > > [ 322.844856] __nfs_revalidate_inode+0xb7/0x2df > > [ 322.844859] nfs_file_write+0xfc/0x29f > > [ 322.844862] new_sync_write+0xfb/0x16b > > [ 322.844866] vfs_write+0x1ee/0x29a > > [ 322.844868] ? __fget_light+0x31/0x72 > > [ 322.844871] ksys_write+0x61/0xd0 > > [ 322.844874] __x64_sys_write+0x1a/0x1c > > [ 322.844877] do_syscall_64+0x32/0x45 > > [ 322.844880] entry_SYSCALL_64_after_hwframe+0x44/0x0 > > [ 322.844881] RIP: 0033:0x7fe71859ab7f > > [ 322.844883] RSP: 002b:00007fe6bd5b0d30 EFLAGS: 00000293 > > ORIG_RAX: 0000000000000001 > > [ 322.844885] RAX: ffffffffffffffda RBX: 00007fe6bd5b0d68 RCX: > > 00007fe71859ab7f > > [ 322.844887] RDX: 0000000000000007 RSI: 00007fe6bd5b1090 RDI: > > 000000000000005b > > [ 322.844888] RBP: 00007fe6bd5b0de0 R08: 0000000000000000 R09: > > 00007ffd971ed090 > > [ 322.844890] R10: 0000000000065fd2 R11: 0000000000000293 R12: > > 00000bd418d95020 > > [ 322.844891] R13: 00007fe6bd5b1090 R14: 0000000000000007 R15: > > 0000000000000000 > > [ 322.844900] task:ThreadPoolForeg state:D stack: 0 pid: 1807 > > ppid: 1773 flags:0x00000000 > > [ 322.844902] Call Trace: > > [ 322.844904] __schedule+0x3db/0x7ee > > [ 322.844907] ? > > trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > > [ 322.844908] schedule+0x68/0xdc > > [ 322.844911] rpc_wait_bit_killable+0x11/0x52 > > [ 322.844913] __wait_on_bit+0x31/0x90 > > [ 322.844915] ? __rpc_atrun+0x20/0x1a > > [ 322.844916] ? __rpc_atrun+0x20/0x1a > > [ 322.844918] out_of_line_wait_on_bit+0x7e/0x80 > > [ 322.844920] ? var_wake_function+0x30/0x27 > > [ 322.844922] __rpc_execute+0xe5/0x485 > > [ 322.844924] rpc_execute+0xa5/0xaa > > [ 322.844926] rpc_run_task+0x148/0x187 > > [ 322.844928] nfs4_do_call_sync+0x61/0x81 > > [ 322.844930] _nfs4_proc_getattr+0x10a/0x120 > > [ 322.844933] nfs4_proc_getattr+0x67/0xee > > [ 322.844936] __nfs_revalidate_inode+0xb7/0x2df > > [ 322.844938] nfs_access_get_cached+0x14d/0x1df > > [ 322.844940] nfs_do_access+0x48/0x28e > > [ 322.844942] ? nfs4_lookup_revalidate+0x38/0x8d > > [ 322.844944] nfs_permission+0xa2/0x1b8 > > [ 322.844946] inode_permission+0x96/0xca > > [ 322.844948] ? unlazy_walk+0x4c/0x80 > > [ 322.844950] link_path_walk.part.0+0x2a5/0x320 > > [ 322.844952] ? path_init+0x2c2/0x3e5 > > [ 322.844954] path_lookupat+0x3f/0x1b1 > > [ 322.844956] filename_lookup+0x97/0x17f > > [ 322.844959] ? do_unlk+0x9a/0xca > > [ 322.844960] ? kmem_cache_alloc+0x32/0x200 > > [ 322.844963] user_path_at_empty+0x59/0x8a > > [ 322.844965] vfs_statx+0x64/0xe5 > > [ 322.844967] __do_sys_newstat+0x31/0x4c > > [ 322.844968] ? do_fcntl+0x113/0x582 > > [ 322.844971] ? fput+0x13/0x15 > > [ 322.844972] ? __x64_sys_fcntl+0x81/0xa8 > > [ 322.844974] __x64_sys_newstat+0x16/0x18 > > [ 322.844976] do_syscall_64+0x32/0x45 > > [ 322.844977] entry_SYSCALL_64_after_hwframe+0x44/0x0 > > [ 322.844979] RIP: 0033:0x7f0d1fd0c9e6 > > [ 322.844980] RSP: 002b:00007f0d08e12e18 EFLAGS: 00000246 > > ORIG_RAX: 0000000000000004 > > [ 322.844981] RAX: ffffffffffffffda RBX: 00007f0d08e12edc RCX: > > 00007f0d1fd0c9e6 > > [ 322.844983] RDX: 00007f0d08e12e28 RSI: 00007f0d08e12e28 RDI: > > 00001605c2ff1169 > > [ 322.844984] RBP: 00007f0d08e12ec0 R08: 0000000000000001 R09: > > 00007fffaddd9090 > > [ 322.844985] R10: 00001605c3133480 R11: 0000000000000246 R12: > > 0000000000000000 > > [ 322.844986] R13: 0000000000000000 R14: 00001605c2ff0f00 R15: > > 0000000000000000 > > [ 322.844988] task:ThreadPoolForeg state:D stack: 0 pid: 2031 > > ppid: 1773 flags:0x00004000 > > [ 322.844990] Call Trace: > > [ 322.844992] __schedule+0x3db/0x7ee > > [ 322.844995] ? > > trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > > [ 322.844997] schedule+0x68/0xdc > > [ 322.844999] rpc_wait_bit_killable+0x11/0x52 > > [ 322.845001] __wait_on_bit+0x31/0x90 > > [ 322.845003] ? rpc_sleep_on_timeout+0x64/0x9c > > [ 322.845005] ? rpc_check_timeout+0x1e0/0x1e0 > > [ 322.845006] ? call_connect_status+0x180/0x17a > > [ 322.845009] out_of_line_wait_on_bit+0x7e/0x80 > > [ 322.845010] ? var_wake_function+0x30/0x27 > > [ 322.845012] __rpc_execute+0xe5/0x485 > > [ 322.845014] rpc_execute+0xa5/0xaa > > [ 322.845016] rpc_run_task+0x148/0x187 > > [ 322.845018] nfs4_do_call_sync+0x61/0x81 > > [ 322.845020] _nfs4_proc_getattr+0x10a/0x120 > > [ 322.845023] nfs4_proc_getattr+0x67/0xee > > [ 322.845026] __nfs_revalidate_inode+0xb7/0x2df > > [ 322.845028] nfs_access_get_cached+0x14d/0x1df > > [ 322.845030] nfs_do_access+0x48/0x28e > > [ 322.845032] ? nfs4_lookup_revalidate+0x38/0x8d > > [ 322.845034] nfs_permission+0xa2/0x1b8 > > [ 322.845036] inode_permission+0x96/0xca > > [ 322.845038] ? unlazy_walk+0x4c/0x80 > > [ 322.845040] link_path_walk.part.0+0x2a5/0x320 > > [ 322.845042] ? path_init+0x2c2/0x3e5 > > [ 322.845044] path_lookupat+0x3f/0x1b1 > > [ 322.845046] filename_lookup+0x97/0x17f > > [ 322.845049] ? kmem_cache_alloc+0x32/0x200 > > [ 322.845051] user_path_at_empty+0x59/0x8a > > [ 322.845054] ? mntput_no_expire+0x4f/0x264 > > [ 322.845055] vfs_statx+0x64/0xe5 > > [ 322.845057] ? putname+0x4b/0x53 > > [ 322.845059] __do_sys_newstat+0x31/0x4c > > [ 322.845062] ? __x64_sys_futex+0x13a/0x159 > > [ 322.845064] ? switch_fpu_return+0x58/0xf2 > > [ 322.845066] __x64_sys_newstat+0x16/0x18 > > [ 322.845068] do_syscall_64+0x32/0x45 > > [ 322.845069] entry_SYSCALL_64_after_hwframe+0x44/0x0 > > [ 322.845071] RIP: 0033:0x7f0d1fd0c9e6 > > [ 322.845072] RSP: 002b:00007f0d00331108 EFLAGS: 00000246 > > ORIG_RAX: 0000000000000004 > > [ 322.845073] RAX: ffffffffffffffda RBX: 00001605c2fdc0a0 RCX: > > 00007f0d1fd0c9e6 > > [ 322.845074] RDX: 00007f0d00331118 RSI: 00007f0d00331118 RDI: > > 00001605c2fdc0a0 > > [ 322.845076] RBP: 00007f0d00331200 R08: 0000000000000001 R09: > > 00007fffaddd9090 > > [ 322.845077] R10: 000000000005771c R11: 0000000000000246 R12: > > 00007f0d00331118 > > [ 322.845078] R13: 00001605c3309c40 R14: 00007f0d003311a8 R15: > > 00007f0d003311c0 > > [ 322.845087] task:192.168.16.8-ma state:D stack: 0 pid: 2044 > > ppid: 2 flags:0x00004000 > > [ 322.845089] Call Trace: > > [ 322.845091] __schedule+0x3db/0x7ee > > [ 322.845094] ? > > trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > > [ 322.845095] schedule+0x68/0xdc > > [ 322.845098] rpc_wait_bit_killable+0x11/0x52 > > [ 322.845100] __wait_on_bit+0x31/0x90 > > [ 322.845102] ? __rpc_atrun+0x20/0x1a > > [ 322.845103] ? __rpc_atrun+0x20/0x1a > > [ 322.845105] out_of_line_wait_on_bit+0x7e/0x80 > > [ 322.845107] ? var_wake_function+0x30/0x27 > > [ 322.845109] __rpc_execute+0xe5/0x485 > > [ 322.845111] rpc_execute+0xa5/0xaa > > [ 322.845113] rpc_run_task+0x148/0x187 > > [ 322.845115] nfs4_do_call_sync+0x61/0x81 > > [ 322.845117] _nfs4_proc_statfs+0xc1/0xcf > > [ 322.845120] nfs4_proc_statfs+0x55/0x74 > > [ 322.845122] nfs_statfs+0x65/0x152 > > [ 322.845124] ? _nfs4_proc_delegreturn+0x21e/0x2e5 > > [ 322.845127] statfs_by_dentry+0x55/0x74 > > [ 322.845129] vfs_statfs+0x24/0x43 > > [ 322.845132] check_free_space+0x3c/0xe5 > > [ 322.845135] do_acct_process+0x5d/0x5e1 > > [ 322.845136] ? nfs_end_delegation_return+0x1cb/0x3aa > > [ 322.845138] ? complete+0x49/0x57 > > [ 322.845141] acct_process+0xe5/0x105 > > [ 322.845143] do_exit+0x781/0xa1b > > [ 322.845145] ? refcount_dec_and_lock+0x22/0x8c > > [ 322.845148] __module_put_and_exit+0x1b/0x1b > > [ 322.845150] nfs4_run_state_manager+0x293/0x933 > > [ 322.845152] ? nfs4_do_reclaim+0x820/0x81c > > [ 322.845155] kthread+0x142/0x15f > > [ 322.845157] ? __kthread_create_worker+0x130/0x122 > > [ 322.845159] ret_from_fork+0x1f/0x2a > > [ 369.859385] INFO: task tee:1176 blocked for more than 122 > > seconds. > > [ 369.859391] Not tainted 5.10.4-00021-gde017294a539-dirty > > #3 > > [ 369.859393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > disables this message. > > [ 369.859395] task:tee state:D stack: 0 pid: 1176 > > ppid: 1173 flags:0x00000000 > > [ 369.859399] Call Trace: > > [ 369.859407] __schedule+0x3db/0x7ee > > [ 369.859410] ? io_schedule+0x70/0x6d > > [ 369.859412] schedule+0x68/0xdc > > [ 369.859415] io_schedule+0x46/0x6d > > [ 369.859417] bit_wait_io+0x11/0x52 > > [ 369.859420] __wait_on_bit+0x31/0x90 > > [ 369.859422] out_of_line_wait_on_bit+0x7e/0x80 > > [ 369.859426] ? var_wake_function+0x30/0x27 > > [ 369.859429] nfs_wait_on_request.part.0+0x31/0x33 > > [ 369.859431] nfs_page_group_lock_head+0x31/0x78 > > [ 369.859433] nfs_lock_and_join_requests+0x9d/0x26e > > [ 369.859436] nfs_updatepage+0x1be/0x985 > > [ 369.859439] nfs_write_end+0x128/0x52a > > [ 369.859443] ? iov_iter_copy_from_user_atomic+0xe8/0x3ce > > [ 369.859446] generic_perform_write+0xed/0x17c > > [ 369.859449] nfs_file_write+0x14a/0x29f > > [ 369.859452] new_sync_write+0xfb/0x16b > > [ 369.859455] vfs_write+0x1ee/0x29a > > [ 369.859457] ? vfs_read+0x17b/0x198 > > [ 369.859459] ksys_write+0x61/0xd0 > > [ 369.859462] __x64_sys_write+0x1a/0x1c > > [ 369.859464] do_syscall_64+0x32/0x45 > > [ 369.859467] entry_SYSCALL_64_after_hwframe+0x44/0x0 > > [ 369.859469] RIP: 0033:0x7f4781f323a3 > > [ 369.859471] RSP: 002b:00007ffcf8220408 EFLAGS: 00000246 > > ORIG_RAX: 0000000000000001 > > [ 369.859474] RAX: ffffffffffffffda RBX: 0000000000000092 RCX: > > 00007f4781f323a3 > > [ 369.859475] RDX: 0000000000000092 RSI: 00007ffcf8220530 RDI: > > 0000000000000003 > > [ 369.859477] RBP: 00007ffcf8220530 R08: 0000000000000000 R09: > > 0000000000000001 > > [ 369.859478] R10: 0000000000400ad2 R11: 0000000000000246 R12: > > 0000000000000092 > > [ 369.859480] R13: 0000000002361580 R14: 0000000000000092 R15: > > 00007f4782201720 > > [ 369.859495] INFO: task CGenericAsyncFi:1927 blocked for more > > than 122 seconds. > > [ 369.859497] Not tainted 5.10.4-00021-gde017294a539-dirty > > #3 > > [ 369.859498] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > disables this message. > > [ 369.859499] task:CGenericAsyncFi state:D stack: 0 pid: 1927 > > ppid: 1540 flags:0x20020000 > > [ 369.859501] Call Trace: > > [ 369.859504] __schedule+0x3db/0x7ee > > [ 369.859506] schedule+0x68/0xdc > > [ 369.859508] io_schedule+0x46/0x6d > > [ 369.859510] wait_on_page_bit_common+0xf8/0x325 > > [ 369.859512] ? > > trace_raw_output_file_check_and_advance_wb_err+0x80/0x7d > > [ 369.859515] wait_on_page_bit+0x3f/0x41 > > [ 369.859517] wait_on_page_writeback.part.0+0x35/0x80 > > [ 369.859519] wait_on_page_writeback+0x27/0x29 > > [ 369.859521] __filemap_fdatawait_range+0x88/0xd9 > > [ 369.859523] filemap_write_and_wait_range+0x45/0x80 > > [ 369.859525] nfs_wb_all+0x30/0x13d > > [ 369.859528] nfs4_file_flush+0x85/0xc0 > > [ 369.859530] filp_close+0x3e/0x78 > > [ 369.859532] __close_fd+0x23/0x30 > > [ 369.859534] __ia32_sys_close+0x22/0x50 > > [ 369.859536] __do_fast_syscall_32+0x5f/0x8e > > [ 369.859538] do_fast_syscall_32+0x3d/0x80 > > [ 369.859540] entry_SYSCALL_compat_after_hwframe+0x45/0x0 > > [ 369.859541] RIP: 0023:0xf7ee6549 > > [ 369.859543] RSP: 002b:000000008c5fddb0 EFLAGS: 00200286 > > ORIG_RAX: 0000000000000006 > > [ 369.859545] RAX: ffffffffffffffda RBX: 000000000000004d RCX: > > 0000000000000002 > > [ 369.859546] RDX: 0000000000000000 RSI: 00000000f7d02e70 RDI: > > 000000000000004d > > [ 369.859547] RBP: 000000008c5fe110 R08: 0000000000000000 R09: > > 0000000000000000 > > [ 369.859548] R10: 0000000000000000 R11: 0000000000000000 R12: > > 0000000000000000 > > [ 369.859549] R13: 0000000000000000 R14: 0000000000000000 R15: > > 0000000000000000 > > [ 394.423093] nfs: server loom not responding, still trying > > > > -- > > NULL && (void) -- Trond Myklebust CTO, Hammerspace Inc 4984 El Camino Real, Suite 208 Los Altos, CA 94022 www.hammer.space