On Tue, 2018-08-21 at 15:11 +1000, NeilBrown wrote: > On Thu, Aug 16 2018, NeilBrown wrote: > > > On Wed, Aug 15 2018, Jeff Layton wrote: > > > > > On Wed, 2018-08-15 at 14:28 +0200, Krzysztof Kozlowski wrote: > > > > Hi, > > > > > > > > Bisect pointed commit ce3147990450a68b3f549088b30f087742a08b5d > > > > ("fs/locks: allow a lock request to block other requests.") to failure > > > > boot of NFSv4 with root on several boards. > > > > > > > > Log is here: > > > > https://krzk.eu/#/builders/21/builds/836/steps/12/logs/serial0 > > > > > > > > With several errors: > > > > kernel BUG at ../fs/locks.c:336! > > > > Unable to handle kernel NULL pointer dereference at virtual address 00000004 > > > > > > > > Configuration: > > > > 1. exynos_defconfig > > > > 2. Arch ARM Linux > > > > 3. Boards: > > > > a. Odroid family (ARMv7, octa-core (Cortex-A7+A15), Exynos5422 SoC) > > > > b. Toradex Colibri VF50 (ARMv7, UP, Cortex-A5) > > > > 4. Systemd: v236, 238 > > > > 5. All boards boot from TFTP with NFS root (NFSv4) > > > > > > > > On Colibri VF50 I got slightly different errors: > > > > [ 11.663204] Internal error: Oops - undefined instruction: 0 [#1] ARM > > > > [ 12.455273] Unable to handle kernel NULL pointer dereference at > > > > virtual address 00000004 > > > > and only with some specific GCC (v6.3) or with other conditions which > > > > I did not bisect yet. Maybe Colibri's failure is unrelated to that > > > > commit. > > > > > > > > Best regards, > > > > Krzysztof > > > > Thanks a lot for the report Krzysztof!! > > > > > > > > The BUG is due to a lock being freed when the fl_blocked list wasn't > > > empty (implying that there were still blocked locks waiting on it). > > > > > > There are a number of calls to locks_delete_lock_ctx in posix_lock_inode > > > and I don't think the fl_blocked list is being handled properly with all > > > of them. It only transplants the blocked locks to a new lock when there > > > are surviving locks on the list, and that may not be the case when the > > > whole file is being unlocked. > > > > locks_delete_lock_ctx() calls locks_unlink_lock_ctx() which calls > > locks_wake_up_block() which doesn't only wake_up the blocks, but also > > detached them. When that function completes, ->fl_blocked must be empty. > > > > The trace shows the locks_free_lock() call at the end of fcntl_setlk64() > > as the problematic call. > > This suggests that do_lock_file_wait() exited with ->fl_blocked > > non-empty, which it shouldn't. > > > > I think we need to insert a call to locks_wake_up_block() in > > do_lock_file_wait() before it returns. > > I cannot find a sequence that would make this necessary, but > > it isn't surprising that there might be one. > > > > I'll dig through the code a bit more later and make sure I understand > > what is happening. > > > > I think this problem if fixed by the following. It is probably > triggered when the owner already has a lock for part of the requested > range. After waiting for some other lock, the pending request gets > merged with the existing lock, and blocked requests aren't moved across > in that case. > > I still haven't done more testing, so this is just FYI, not a > submission. > > Thanks, > NeilBrown > > From: NeilBrown <neilb@xxxxxxxx> > Date: Tue, 21 Aug 2018 15:09:06 +1000 > Subject: [PATCH] fs/locks: always delete_block after waiting. > > Now that requests can block other requests, we > need to be careful to always clean up those blocked > requests. > Any time that we wait for a request, we might have > other requests attached, and when we stop waiting, > we much clean them up. > If the lock was granted, the requests might have been > moved to the new lock, though when merged with a > pre-exiting lock, this might not happen. > No all cases we don't want blocked locks to remain > attached, so we remove them to be safe. > > Signed-off-by: NeilBrown <neilb@xxxxxxxx> > --- > fs/locks.c | 24 +++++++++--------------- > 1 file changed, 9 insertions(+), 15 deletions(-) > > diff --git a/fs/locks.c b/fs/locks.c > index de38bafb7f7b..6b310112cf3b 100644 > --- a/fs/locks.c > +++ b/fs/locks.c > @@ -1276,12 +1276,10 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl) > if (error != FILE_LOCK_DEFERRED) > break; > error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker); > - if (!error) > - continue; > - > - locks_delete_block(fl); > - break; > + if (error) > + break; > } > + locks_delete_block(fl); > return error; > } > > @@ -1971,12 +1969,10 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl) > if (error != FILE_LOCK_DEFERRED) > break; > error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker); > - if (!error) > - continue; > - > - locks_delete_block(fl); > - break; > + if (error) > + break; > } > + locks_delete_block(fl); > return error; > } > > @@ -2250,12 +2246,10 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd, > if (error != FILE_LOCK_DEFERRED) > break; > error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker); > - if (!error) > - continue; > - > - locks_delete_block(fl); > - break; > + if (error) > + break; > } > + locks_delete_block(fl); > > return error; > } Thanks Neil. FWIW, I was able to reproduce something that looked a lot like what Krzysztof reported by running the cthon04 lock tests on a client running the kernel with the original set. I applied the above patch on top of that set, reran the test and got a different BUG (list corruption): [ 85.117307] ------------[ cut here ]------------ [ 85.118130] kernel BUG at lib/list_debug.c:53! [ 85.118684] invalid opcode: 0000 [#1] SMP NOPTI [ 85.119800] CPU: 5 PID: 92 Comm: kworker/u16:1 Not tainted 4.18.0+ #46 [ 85.120845] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014 [ 85.122350] Workqueue: rpciod rpc_async_schedule [sunrpc] [ 85.123242] RIP: 0010:__list_del_entry_valid.cold.1+0x34/0x4c [ 85.124116] Code: 03 10 9b e8 08 07 ca ff 0f 0b 48 c7 c7 c8 03 10 9b e8 fa 06 ca ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 88 03 10 9b e8 e6 06 ca ff <0f> 0b 48 89 fe 48 c7 c7 50 03 10 9b e8 d5 06 ca ff 0f 0b 90 90 90 [ 85.126704] RSP: 0018:ffffa0fe0133bd90 EFLAGS: 00010246 [ 85.127382] RAX: 0000000000000054 RBX: ffff92bcf3a46ad8 RCX: 0000000000000000 [ 85.128322] RDX: 0000000000000000 RSI: ffff92bcffd56828 RDI: ffff92bcffd56828 [ 85.129251] RBP: ffff92bcf3a46b10 R08: 0000000000000000 R09: 0000000000aaaaaa [ 85.130250] R10: 0000000000000000 R11: 0000000000000001 R12: ffff92bce2588618 [ 85.131230] R13: ffff92bcf3a45800 R14: ffffffffc06f9f60 R15: ffffffffc06f9f60 [ 85.132191] FS: 0000000000000000(0000) GS:ffff92bcffd40000(0000) knlGS:0000000000000000 [ 85.133296] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 85.134088] CR2: 00007f16c1292008 CR3: 0000000138588000 CR4: 00000000000006e0 [ 85.134926] Call Trace: [ 85.135251] __locks_delete_block+0x3f/0x70 [ 85.135751] locks_delete_block+0x25/0x30 [ 85.136259] locks_lock_inode_wait+0x63/0x150 [ 85.136841] ? nfs41_release_slot+0x98/0xd0 [nfsv4] [ 85.137556] nfs4_lock_done+0x1a2/0x1c0 [nfsv4] [ 85.138272] rpc_exit_task+0x2d/0x80 [sunrpc] [ 85.138994] __rpc_execute+0x7f/0x340 [sunrpc] [ 85.139953] process_one_work+0x1a1/0x350 [ 85.140678] worker_thread+0x30/0x380 [ 85.141800] ? wq_update_unbound_numa+0x1a0/0x1a0 [ 85.142904] kthread+0x112/0x130 [ 85.143445] ? kthread_create_worker_on_cpu+0x70/0x70 [ 85.144273] ret_from_fork+0x22/0x40 [ 85.144859] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache xt_conntrack nf_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_mangle ip6table_raw ip6table_security iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables sunrpc i2c_piix4 edac_mce_amd joydev pcspkr virtio_balloon xfs libcrc32c virtio_net net_failover virtio_console failover virtio_blk floppy qxl serio_raw drm_kms_helper ttm qemu_fw_cfg drm virtio_pci ata_generic pata_acpi virtio_rng virtio_ring virtio [ 85.153426] ---[ end trace 63df06139208ee23 ]--- -- Jeff Layton <jlayton@xxxxxxxxxx>