Calling nfs_kill_super from an RPC task callback would result in a deadlock where nfs_free_server (via rpc_shutdown_client) tries to kill all RPC tasks associated with that connection - including itself! Instead of calling nfs_kill_super directly, queue a job on the nfsiod workqueue. Signed-off-by: Weston Andros Adamson <dros@xxxxxxxxxx> --- This fixes the current incarnation of the lockup I've been tracking down for some time now. I still have to go back and see why the reproducer changed behavior a few weeks ago - tasks used to get stuck in rpc_prepare_task, but now (before this patch) are stuck in rpc_exit. The reproducer works against a server with write delegations: ./nfsometer.py -m v4 server:/path dd_100m_100k which is basically: - mount - dd if=/dev/zero of=./dd_file.100m_100k bs=102400 count=1024 - umount - break if /proc/fs/nfsfs/servers still has entry after 5 seconds (in this case it NEVER goes away) There are clearly other ways to trigger this deadlock, like a v4.1 CLOSE - the done handler calls nfs_sb_deactivate... I've tested this approach with 10 runs X 3 nfs versions X 5 workloads (dd_100m_100k, dd_100m_1k, python, kernel, cthon), so I'm pretty confident its correct. One question for the list: should nfs_free_server *always* be scheduled on the nfsiod workqueue? It's called in error paths in several locations. After looking at them, I don't think my approach would break anything, but some might have style objections. -dros fs/nfs/client.c | 20 +++++++++++++++++--- include/linux/nfs_fs_sb.h | 1 + 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/fs/nfs/client.c b/fs/nfs/client.c index c285e0a..9186a96 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -1010,9 +1010,11 @@ EXPORT_SYMBOL_GPL(nfs_alloc_server); /* * Free up a server record */ -void nfs_free_server(struct nfs_server *server) +static void nfs_free_server_schedule_work(struct work_struct *work) { - dprintk("--> nfs_free_server()\n"); + struct nfs_server *server = container_of(work, struct nfs_server, work); + + dprintk("--> %s\n", __func__); nfs_server_remove_lists(server); @@ -1032,7 +1034,19 @@ void nfs_free_server(struct nfs_server *server) bdi_destroy(&server->backing_dev_info); kfree(server); nfs_release_automount_timer(); - dprintk("<-- nfs_free_server()\n"); + dprintk("<-- %s\n", __func__); +} + +/* + * Queue work on nfsiod workqueue to free up a server record. + * This avoids a deadlock when an RPC task scheduled from the rpciod + * workqueue tries to kill itself. + */ +void nfs_free_server(struct nfs_server *server) +{ + WARN_ON_ONCE(work_pending(&server->work)); + INIT_WORK(&server->work, nfs_free_server_schedule_work); + queue_work(nfsiod_workqueue, &server->work); } EXPORT_SYMBOL_GPL(nfs_free_server); diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h index a9e76ee..a607886 100644 --- a/include/linux/nfs_fs_sb.h +++ b/include/linux/nfs_fs_sb.h @@ -171,6 +171,7 @@ struct nfs_server { void (*destroy)(struct nfs_server *); atomic_t active; /* Keep trace of any activity to this server */ + struct work_struct work; /* used to schedule free */ /* mountd-related mount options */ struct sockaddr_storage mountd_address; -- 1.7.9.6 (Apple Git-31.1) -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html