On Tue, 2020-11-24 at 20:28 -0500, David Wysochanski wrote: > On Tue, Nov 24, 2020 at 8:07 PM Trond Myklebust < > trondmy@xxxxxxxxxxxxxxx> wrote: > > > > On Tue, 2020-11-24 at 16:56 -0500, David Wysochanski wrote: > > > I've started seeing this failure since testing 5.10-rc4 - this > > > does > > > not happen on 5.9 > > > > > > > > > f31-node1 login: [ 124.055768] FS-Cache: Netfs 'nfs' registered > > > for > > > caching > > > [ 125.046104] Key type dns_resolver registered > > > [ 125.770354] NFS: Registering the id_resolver key type > > > [ 125.780599] Key type id_resolver registered > > > [ 125.782440] Key type id_legacy registered > > > [ 126.563717] run fstests generic/074 at 2020-11-24 11:23:49 > > > [ 178.736479] > > > [ 178.751380] ============================= > > > [ 178.753249] [ BUG: Invalid wait context ] > > > [ 178.754886] 5.10.0-rc4 #127 Not tainted > > > [ 178.756423] ----------------------------- > > > [ 178.758055] kworker/1:2/848 is trying to lock: > > > [ 178.759866] ffff8947fffd33d8 (&zone->lock){..-.}-{3:3}, at: > > > get_page_from_freelist+0x897/0x2190 > > > [ 178.763333] other info that might help us debug this: > > > [ 178.765354] context-{5:5} > > > [ 178.766437] 3 locks held by kworker/1:2/848: > > > [ 178.768158] #0: ffff8946ce825538 > > > ((wq_completion)nfsiod){+.+.}-{0:0}, at: > > > process_one_work+0x1be/0x540 > > > [ 178.771871] #1: ffff9e6b408f7e58 > > > ((work_completion)(&task->u.tk_work)#2){+.+.}-{0:0}, at: > > > process_one_work+0x1be/0x540 > > > [ 178.776562] #2: ffff8947f7c5b2b0 (krc.lock){..-.}-{2:2}, at: > > > kvfree_call_rcu+0x69/0x230 > > > [ 178.779803] stack backtrace: > > > [ 178.780996] CPU: 1 PID: 848 Comm: kworker/1:2 Kdump: loaded > > > Not > > > tainted 5.10.0-rc4 #127 > > > [ 178.784374] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 > > > [ 178.787071] Workqueue: nfsiod rpc_async_release [sunrpc] > > > [ 178.789308] Call Trace: > > > [ 178.790386] dump_stack+0x8d/0xb5 > > > [ 178.791816] __lock_acquire.cold+0x20b/0x2c8 > > > [ 178.793605] lock_acquire+0xca/0x380 > > > [ 178.795113] ? get_page_from_freelist+0x897/0x2190 > > > [ 178.797116] _raw_spin_lock+0x2c/0x40 > > > [ 178.798638] ? get_page_from_freelist+0x897/0x2190 > > > [ 178.800620] get_page_from_freelist+0x897/0x2190 > > > [ 178.802537] __alloc_pages_nodemask+0x1b4/0x460 > > > [ 178.804416] __get_free_pages+0xd/0x30 > > > [ 178.805987] kvfree_call_rcu+0x168/0x230 > > > [ 178.807687] nfs_free_request+0xab/0x180 [nfs] > > > [ 178.809547] nfs_page_group_destroy+0x41/0x80 [nfs] > > > [ 178.811588] nfs_read_completion+0x129/0x1f0 [nfs] > > > [ 178.813633] rpc_free_task+0x39/0x60 [sunrpc] > > > [ 178.815481] rpc_async_release+0x29/0x40 [sunrpc] > > > [ 178.817451] process_one_work+0x23e/0x540 > > > [ 178.819136] worker_thread+0x50/0x3a0 > > > [ 178.820657] ? process_one_work+0x540/0x540 > > > [ 178.822427] kthread+0x10f/0x150 > > > [ 178.823805] ? kthread_park+0x90/0x90 > > > [ 178.825339] ret_from_fork+0x22/0x30 > > > > > > > I can't think of any changes that might have caused this. Is this > > NFSv3, v4 or other? I haven't been seeing any of this. > > > > It is NFSv4.1 or NFS4.2. I am running the xfstests NFS client > against > an older server, RHEL7 based (3.10.0-1127.8.2.el7.x86_64) though not > sure if that matters. > My config has these: > CONFIG_LOCK_DEBUGGING_SUPPORT=y > CONFIG_PROVE_LOCKING=y > CONFIG_PROVE_RAW_LOCK_NESTING=y > CONFIG_DEBUG_SPINLOCK=y > CONFIG_DEBUG_LOCK_ALLOC=y > CONFIG_LOCKDEP=y > That helps. It means we can't blame the new READ_PLUS code, since it would be completely disabled here. Are you using any special rsize values? Also, could pNFS be involved (e.g. the pNFS block/scsi code)? -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx