Re: CB_LAYOUTRECALL "deadlock" with in-kernel flexfiles server and XFS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11 Aug 2016, at 11:23, Jeff Layton wrote:

I was playing around with the in-kernel flexfiles server today, and I
seem to be hitting a deadlock when using it on an XFS-exported
filesystem. Here's the stack trace of how the CB_LAYOUTRECALL occurs:

[ 928.736139] CPU: 0 PID: 846 Comm: nfsd Tainted: G OE 4.8.0-rc1+ #3 [ 928.737040] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014 [ 928.738009] 0000000000000286 000000006125f50e ffff91153845b878 ffffffff8f463853 [ 928.738906] ffff91152ec194d0 ffff91152d31d9c0 ffff91153845b8a8 ffffffffc045936f [ 928.739788] ffff91152c051980 ffff91152d31d9c0 ffff91152c051540 ffff9115361b8a58
[  928.740697] Call Trace:
[  928.740998]  [<ffffffff8f463853>] dump_stack+0x86/0xc3
[ 928.741570] [<ffffffffc045936f>] nfsd4_recall_file_layout+0x17f/0x190 [nfsd] [ 928.742380] [<ffffffffc045939d>] nfsd4_layout_lm_break+0x1d/0x30 [nfsd]
[  928.743115]  [<ffffffff8f3056d8>] __break_lease+0x118/0x6a0
[ 928.743759] [<ffffffffc02dea69>] xfs_break_layouts+0x79/0x120 [xfs] [ 928.744462] [<ffffffffc029ea04>] xfs_file_aio_write_checks+0x94/0x1f0 [xfs] [ 928.745251] [<ffffffffc029f36b>] xfs_file_buffered_aio_write+0x7b/0x330 [xfs] [ 928.746063] [<ffffffffc029f70c>] xfs_file_write_iter+0xec/0x140 [xfs]
[  928.746803]  [<ffffffff8f2a0599>] do_iter_readv_writev+0xb9/0x140
[  928.747478]  [<ffffffff8f2a126b>] do_readv_writev+0x19b/0x240
[ 928.748146] [<ffffffffc029f620>] ? xfs_file_buffered_aio_write+0x330/0x330 [xfs]
[  928.748956]  [<ffffffff8f29e02b>] ? do_dentry_open+0x28b/0x310
[ 928.749614] [<ffffffffc029c800>] ? xfs_extent_busy_ag_cmp+0x20/0x20 [xfs]
[  928.750367]  [<ffffffff8f2a156f>] vfs_writev+0x3f/0x50
[  928.750934]  [<ffffffffc04276ca>] nfsd_vfs_write+0xca/0x3a0 [nfsd]
[  928.751608]  [<ffffffffc0429ec5>] nfsd_write+0x485/0x780 [nfsd]
[ 928.752263] [<ffffffffc043144c>] nfsd3_proc_write+0xbc/0x150 [nfsd]
[  928.752973]  [<ffffffffc0421388>] nfsd_dispatch+0xb8/0x1f0 [nfsd]
[ 928.753642] [<ffffffffc036d78f>] svc_process_common+0x42f/0x690 [sunrpc]
[  928.754395]  [<ffffffffc036e8e8>] svc_process+0x118/0x330 [sunrpc]
[  928.755080]  [<ffffffffc04208ac>] nfsd+0x19c/0x2b0 [nfsd]
[  928.755681]  [<ffffffffc0420715>] ? nfsd+0x5/0x2b0 [nfsd]
[  928.756274]  [<ffffffffc0420710>] ? nfsd_destroy+0x190/0x190 [nfsd]
[  928.756991]  [<ffffffff8f0d5891>] kthread+0x101/0x120
[ 928.757563] [<ffffffff8f10dcc5>] ? trace_hardirqs_on_caller+0xf5/0x1b0
[  928.758282]  [<ffffffff8f8f2fef>] ret_from_fork+0x1f/0x40
[ 928.758875] [<ffffffff8f0d5790>] ? kthread_create_on_node+0x250/0x250


So the client gets a flexfiles layout, and then tries to issue a v3
WRITE against the file. XFS then recalls the layout, but the client
can't return the layout until the v3 WRITE completes. Eventually this
should resolve itself after 2 lease periods, but that's quite a long
time.

I guess XFS requires recalling block and SCSI layouts when the server
wants to issue a write (or someone writes to it locally), but that
seems like it shouldn't be happening when the layout is a flexfiles
layout.

Any thoughts on what the right fix is here?

On a related note, knfsd will spam the heck out of the client with
CB_LAYOUTRECALLs during this time. I think we ought to consider fixing
the server not to treat an NFS_OK return from the client like
NFS4ERR_DELAY there, but that would mean a different mechanism for
timing out a CB_LAYOUTRECALL.

I'm getting into similar trouble with SCSI layouts when the client ends up submitting a WRITE because the IO is not page aligned, but it already holds a layout for that range. It looks like the server sends a CB_LAYOUTRECALL, but the client has to answer NFS4ERR_DELAY because it is still holding the
layout.

Probably, the client should return any layouts it holds for that range before
doing IO through the MDS.

Alternatively, shouldn't the MDS accept IO from the same client that holds a
layout for that range, rather than recall that layout?  RFC 5661 Section
20.3.4 talks about the client submitting WRITEs before responding to
CB_LAYOUTRECALL: "As always, the client may write the data through the
metadata server."

I'm trying to find the discussion that resulted in this commit:

commit 6b9b21073d3b250e17812cd562fffc9006962b39
Author: Jeff Layton <jlayton@xxxxxxxxxxxxxxx>
Date:   Tue Dec 8 07:23:48 2015 -0500

    nfsd: give up on CB_LAYOUTRECALLs after two lease periods

Why should we poll the client if the client answers with NFS4ERR_DELAY? Can
we instead just wait for the layout to be returned?

Also, I think the 2*lease period timeout is currently broken because we reset
tk_start after every call.. but that's not really causing any trouble.

Ben
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux