On Tue, 2024-10-08 at 21:27 +0800, Wang Yugui wrote: > Hi, > > nfs client deadloop on 6.6.53. > > [ 9409.381322] sysrq: Show Blocked State > [ 9409.386146] task:bash state:D stack:0 pid:2323 > ppid:2226 flags:0x00004002 > [ 9409.395225] Call Trace: > [ 9409.398376] <TASK> > [ 9409.401172] __schedule+0x232/0x5d0 > [ 9409.405370] schedule+0x5e/0xd0 > [ 9409.409217] schedule_timeout+0x8c/0x170 > [ 9409.413837] ? __pfx_process_timeout+0x10/0x10 > [ 9409.418989] msleep+0x3b/0x50 > [ 9409.422656] ff_layout_pg_init_read+0x1c1/0x290 > [nfs_layout_flexfiles] > [ 9409.429910] __nfs_pageio_add_request+0x29b/0x480 [nfs] > [ 9409.435911] nfs_pageio_add_request+0x221/0x2a0 [nfs] > [ 9409.441715] nfs_read_add_folio+0x1a3/0x280 [nfs] > [ 9409.447175] nfs_readahead+0x235/0x2d0 [nfs] > [ 9409.452193] read_pages+0x56/0x2c0 > [ 9409.456298] page_cache_ra_unbounded+0x134/0x1a0 > [ 9409.461626] filemap_get_pages+0xf5/0x3a0 > [ 9409.466355] ? __nfs_lookup_revalidate+0x53/0x140 [nfs] > [ 9409.472325] filemap_read+0xdc/0x350 > [ 9409.476614] ? find_idlest_group+0x113/0x530 > [ 9409.481614] nfs_file_read+0x74/0xc0 [nfs] > [ 9409.486461] __kernel_read+0xff/0x2b0 > [ 9409.490838] search_binary_handler+0x70/0x250 > [ 9409.495908] exec_binprm+0x50/0x1a0 > [ 9409.500102] bprm_execve.part.0+0x17d/0x230 > [ 9409.504993] do_execveat_common.isra.0+0x1a2/0x240 > [ 9409.510489] __x64_sys_execve+0x37/0x50 > [ 9409.515026] do_syscall_64+0x5a/0x90 > [ 9409.519298] ? __count_memcg_events+0x4c/0xa0 > [ 9409.524348] ? mm_account_fault+0x6c/0x100 > [ 9409.529129] ? handle_mm_fault+0x154/0x280 > [ 9409.533903] ? do_user_addr_fault+0x35f/0x680 > [ 9409.538935] ? exc_page_fault+0x69/0x150 > [ 9409.543537] entry_SYSCALL_64_after_hwframe+0x78/0xe2 > [ 9409.549277] RIP: 0033:0x7f57378d987b > [ 9409.553572] RSP: 002b:00007ffdb5978708 EFLAGS: 00000246 ORIG_RAX: > 000000000000003b > [ 9409.561847] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > 00007f57378d987b > [ 9409.569690] RDX: 000055d26e403600 RSI: 000055d26e5cdc50 RDI: > 000055d26e6ce7f0 > [ 9409.577534] RBP: 000055d26e6ce7f0 R08: 000055d26e5a5b60 R09: > 0000000000000000 > [ 9409.585375] R10: 0000000000000008 R11: 0000000000000246 R12: > 00000000ffffffff > [ 9409.593208] R13: 000055d26e5cdc50 R14: 000055d26e403600 R15: > 000055d26e6ceb40 > [ 9409.601047] </TASK> > [ 9409.603946] task:bash state:D stack:0 pid:2550 > ppid:2462 flags:0x00004002 > [ 9409.613027] Call Trace: > [ 9409.616185] <TASK> > [ 9409.618983] __schedule+0x232/0x5d0 > [ 9409.623186] schedule+0x5e/0xd0 > [ 9409.627033] io_schedule+0x46/0x70 > [ 9409.631140] folio_wait_bit_common+0x133/0x390 > [ 9409.636294] ? folio_wait_bit_common+0x100/0x390 > [ 9409.641624] ? nfs4_do_open+0xcd/0x210 [nfsv4] > [ 9409.646854] ? __pfx_wake_page_function+0x10/0x10 > [ 9409.652268] filemap_update_page+0x2bc/0x300 > [ 9409.657242] filemap_get_pages+0x21d/0x3a0 > [ 9409.662042] ? __nfs_lookup_revalidate+0x53/0x140 [nfs] > [ 9409.668010] filemap_read+0xdc/0x350 > [ 9409.672299] nfs_file_read+0x74/0xc0 [nfs] > [ 9409.677126] __kernel_read+0xff/0x2b0 > [ 9409.681476] search_binary_handler+0x70/0x250 > [ 9409.686526] exec_binprm+0x50/0x1a0 > [ 9409.690702] bprm_execve.part.0+0x17d/0x230 > [ 9409.695573] do_execveat_common.isra.0+0x1a2/0x240 > [ 9409.701047] __x64_sys_execve+0x37/0x50 > [ 9409.705559] do_syscall_64+0x5a/0x90 > [ 9409.709805] ? do_user_addr_fault+0x35f/0x680 > [ 9409.714834] ? exc_page_fault+0x69/0x150 > [ 9409.719414] entry_SYSCALL_64_after_hwframe+0x78/0xe2 > [ 9409.725126] RIP: 0033:0x7f3c492d987b > [ 9409.729362] RSP: 002b:00007ffc6413a458 EFLAGS: 00000246 ORIG_RAX: > 000000000000003b > [ 9409.737609] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > 00007f3c492d987b > [ 9409.745429] RDX: 000055c6a8f07600 RSI: 000055c6a90e72a0 RDI: > 000055c6a90f7890 > [ 9409.753256] RBP: 000055c6a90f7890 R08: 000055c6a90f6250 R09: > 0000000000000000 > [ 9409.761078] R10: 0000000000000008 R11: 0000000000000246 R12: > 00000000ffffffff > [ 9409.768904] R13: 000055c6a90e72a0 R14: 000055c6a8f07600 R15: > 000055c6a90e1ea0 > [ 9409.776732] </TASK> > > Notice: > 1, nfs server: kernel 6.6.54 > pnfs optin in the service side /etc/exports. > This is not a client bug. The client has no choice other than to retry here. It is being given a layout that it cannot use (probably because it has already discovered that it cannot talk to the data server), but it is also being told by the same layout that it is not allowed to fall back to doing I/O through the metadata server. IOW: This bug needs to be fixed on the server, which is handing out a layout that is impossible to satisfy. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx