Re: [BUG][BISECT] NFSv4 root failures after "fs/locks: allow a lock request to block other requests."

Jeff Layton <jlayton@xxxxxxxxxx> · Tue, 21 Aug 2018 07:14:06 -0400

On Tue, 2018-08-21 at 15:11 +1000, NeilBrown wrote:
> On Thu, Aug 16 2018, NeilBrown wrote:
> 
> > On Wed, Aug 15 2018, Jeff Layton wrote:
> > 
> > > On Wed, 2018-08-15 at 14:28 +0200, Krzysztof Kozlowski wrote:
> > > > Hi,
> > > > 
> > > > Bisect pointed commit ce3147990450a68b3f549088b30f087742a08b5d
> > > > ("fs/locks: allow a lock request to block other requests.") to failure
> > > > boot of NFSv4 with root on several boards.
> > > > 
> > > > Log is here:
> > > > https://krzk.eu/#/builders/21/builds/836/steps/12/logs/serial0
> > > > 
> > > > With several errors:
> > > > kernel BUG at ../fs/locks.c:336!
> > > > Unable to handle kernel NULL pointer dereference at virtual address 00000004
> > > > 
> > > > Configuration:
> > > > 1. exynos_defconfig
> > > > 2. Arch ARM Linux
> > > > 3. Boards:
> > > > a. Odroid family (ARMv7, octa-core (Cortex-A7+A15), Exynos5422 SoC)
> > > > b. Toradex Colibri VF50 (ARMv7, UP, Cortex-A5)
> > > > 4. Systemd: v236, 238
> > > > 5. All boards boot from TFTP with NFS root (NFSv4)
> > > > 
> > > > On Colibri VF50 I got slightly different errors:
> > > > [   11.663204] Internal error: Oops - undefined instruction: 0 [#1] ARM
> > > > [   12.455273] Unable to handle kernel NULL pointer dereference at
> > > > virtual address 00000004
> > > > and only with some specific GCC (v6.3) or with other conditions which
> > > > I did not bisect yet. Maybe Colibri's failure is unrelated to that
> > > > commit.
> > > > 
> > > > Best regards,
> > > > Krzysztof
> > 
> > Thanks a lot for the report Krzysztof!!
> > 
> > > 
> > > The BUG is due to a lock being freed when the fl_blocked list wasn't
> > > empty (implying that there were still blocked locks waiting on it).
> > > 
> > > There are a number of calls to locks_delete_lock_ctx in posix_lock_inode
> > > and I don't think the fl_blocked list is being handled properly with all
> > > of them. It only transplants the blocked locks to a new lock when there
> > > are surviving locks on the list, and that may not be the case when the
> > > whole file is being unlocked.
> > 
> > locks_delete_lock_ctx() calls locks_unlink_lock_ctx() which calls
> > locks_wake_up_block() which doesn't only wake_up the blocks, but also
> > detached them. When that function completes, ->fl_blocked must be empty.
> > 
> > The trace shows the locks_free_lock() call at the end of fcntl_setlk64()
> > as the problematic call.
> > This suggests that do_lock_file_wait() exited with ->fl_blocked
> > non-empty, which it shouldn't.
> > 
> > I think we need to insert a call to locks_wake_up_block() in
> > do_lock_file_wait() before it returns.
> > I cannot find a sequence that would make this necessary, but
> > it isn't surprising that there might be one.
> > 
> > I'll dig through the code a bit more later and make sure I understand
> > what is happening.
> > 
> 
> I think this problem if fixed by the following.  It is probably
> triggered when the owner already has a lock for part of the requested
> range.  After waiting for some other lock, the pending request gets
> merged with the existing lock, and blocked requests aren't moved across
> in that case.
> 
> I still haven't done more testing, so this is just FYI, not a
> submission.
> 
> Thanks,
> NeilBrown
> 
> From: NeilBrown <neilb@xxxxxxxx>
> Date: Tue, 21 Aug 2018 15:09:06 +1000
> Subject: [PATCH] fs/locks: always delete_block after waiting.
> 
> Now that requests can block other requests, we
> need to be careful to always clean up those blocked
> requests.
> Any time that we wait for a request, we might have
> other requests attached, and when we stop waiting,
> we much clean them up.
> If the lock was granted, the requests might have been
> moved to the new lock, though when merged with a
> pre-exiting lock, this might not happen.
> No all cases we don't want blocked locks to remain
> attached, so we remove them to be safe.
> 
> Signed-off-by: NeilBrown <neilb@xxxxxxxx>
> ---
>  fs/locks.c | 24 +++++++++---------------
>  1 file changed, 9 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index de38bafb7f7b..6b310112cf3b 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -1276,12 +1276,10 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl)
>  		if (error != FILE_LOCK_DEFERRED)
>  			break;
>  		error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
> -		if (!error)
> -			continue;
> -
> -		locks_delete_block(fl);
> -		break;
> +		if (error)
> +			break;
>  	}
> +	locks_delete_block(fl);
>  	return error;
>  }
>  
> @@ -1971,12 +1969,10 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl)
>  		if (error != FILE_LOCK_DEFERRED)
>  			break;
>  		error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
> -		if (!error)
> -			continue;
> -
> -		locks_delete_block(fl);
> -		break;
> +		if (error)
> +			break;
>  	}
> +	locks_delete_block(fl);
>  	return error;
>  }
>  
> @@ -2250,12 +2246,10 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
>  		if (error != FILE_LOCK_DEFERRED)
>  			break;
>  		error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
> -		if (!error)
> -			continue;
> -
> -		locks_delete_block(fl);
> -		break;
> +		if (error)
> +			break;
>  	}
> +	locks_delete_block(fl);
>  
>  	return error;
>  }

Thanks Neil.

FWIW, I was able to reproduce something that looked a lot like what
Krzysztof reported by running the cthon04 lock tests on a client running
the kernel with the original set.

I applied the above patch on top of that set, reran the test and got a
different BUG (list corruption):

[   85.117307] ------------[ cut here ]------------
[   85.118130] kernel BUG at lib/list_debug.c:53!
[   85.118684] invalid opcode: 0000 [#1] SMP NOPTI
[   85.119800] CPU: 5 PID: 92 Comm: kworker/u16:1 Not tainted 4.18.0+ #46
[   85.120845] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014
[   85.122350] Workqueue: rpciod rpc_async_schedule [sunrpc]
[   85.123242] RIP: 0010:__list_del_entry_valid.cold.1+0x34/0x4c
[   85.124116] Code: 03 10 9b e8 08 07 ca ff 0f 0b 48 c7 c7 c8 03 10 9b e8 fa 06 ca ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 88 03 10 9b e8 e6 06 ca ff <0f> 0b 48 89 fe 48 c7 c7 50 03 10 9b e8 d5 06 ca ff 0f 0b 90 90 90
[   85.126704] RSP: 0018:ffffa0fe0133bd90 EFLAGS: 00010246
[   85.127382] RAX: 0000000000000054 RBX: ffff92bcf3a46ad8 RCX: 0000000000000000
[   85.128322] RDX: 0000000000000000 RSI: ffff92bcffd56828 RDI: ffff92bcffd56828
[   85.129251] RBP: ffff92bcf3a46b10 R08: 0000000000000000 R09: 0000000000aaaaaa
[   85.130250] R10: 0000000000000000 R11: 0000000000000001 R12: ffff92bce2588618
[   85.131230] R13: ffff92bcf3a45800 R14: ffffffffc06f9f60 R15: ffffffffc06f9f60
[   85.132191] FS:  0000000000000000(0000) GS:ffff92bcffd40000(0000) knlGS:0000000000000000
[   85.133296] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   85.134088] CR2: 00007f16c1292008 CR3: 0000000138588000 CR4: 00000000000006e0
[   85.134926] Call Trace:
[   85.135251]  __locks_delete_block+0x3f/0x70
[   85.135751]  locks_delete_block+0x25/0x30
[   85.136259]  locks_lock_inode_wait+0x63/0x150
[   85.136841]  ? nfs41_release_slot+0x98/0xd0 [nfsv4]
[   85.137556]  nfs4_lock_done+0x1a2/0x1c0 [nfsv4]
[   85.138272]  rpc_exit_task+0x2d/0x80 [sunrpc]
[   85.138994]  __rpc_execute+0x7f/0x340 [sunrpc]
[   85.139953]  process_one_work+0x1a1/0x350
[   85.140678]  worker_thread+0x30/0x380
[   85.141800]  ? wq_update_unbound_numa+0x1a0/0x1a0
[   85.142904]  kthread+0x112/0x130
[   85.143445]  ? kthread_create_worker_on_cpu+0x70/0x70
[   85.144273]  ret_from_fork+0x22/0x40
[   85.144859] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache xt_conntrack nf_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_mangle ip6table_raw ip6table_security iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables sunrpc i2c_piix4 edac_mce_amd joydev pcspkr virtio_balloon xfs libcrc32c virtio_net net_failover virtio_console failover virtio_blk floppy qxl serio_raw drm_kms_helper ttm qemu_fw_cfg drm virtio_pci ata_generic pata_acpi virtio_rng virtio_ring virtio
[   85.153426] ---[ end trace 63df06139208ee23 ]---

-- 
Jeff Layton <jlayton@xxxxxxxxxx>