Hi Linus, Please pull from the "bugfixes" branch of the repository at git pull git://git.linux-nfs.org/projects/trondmy/nfs-2.6.git bugfixes This will update the following files through the appended changesets. Cheers, Trond ---- fs/nfs/inode.c | 7 ++- fs/nfs/nfs4_fs.h | 10 ++- fs/nfs/nfs4filelayoutdev.c | 4 + fs/nfs/nfs4proc.c | 91 +++++++++++++++++------------- fs/nfs/nfs4state.c | 29 +++++++--- fs/nfs/nfs4xdr.c | 4 +- fs/nfs/nfsroot.c | 29 +++++----- fs/nfs/unlink.c | 2 +- fs/nfs/write.c | 2 + include/linux/nfs_fs_sb.h | 10 +-- include/linux/sunrpc/sched.h | 1 + kernel/sched.c | 1 + net/sunrpc/sched.c | 75 ++++++++++++++++++++----- net/sunrpc/xprtrdma/svc_rdma_transport.c | 1 + net/sunrpc/xprtsock.c | 3 +- 15 files changed, 178 insertions(+), 91 deletions(-) commit 53d4737580535e073963b91ce87d4216e434fab5 Author: Chuck Lever <chuck.lever@xxxxxxxxxx> Date: Fri Mar 11 15:31:06 2011 -0500 NFS: NFSROOT should default to "proto=udp" There have been a number of recent reports that NFSROOT is no longer working with default mount options, but fails only with certain NICs. Brian Downing <bdowning@xxxxxxxxx> bisected to commit 56463e50 "NFS: Use super.c for NFSROOT mount option parsing". Among other things, this commit changes the default mount options for NFSROOT to use TCP instead of UDP as the underlying transport. TCP seems less able to deal with NICs that are slow to initialize. The system logs that have accompanied reports of problems all show that NFSROOT attempts to establish a TCP connection before the NIC is fully initialized, and thus the TCP connection attempt fails. When a TCP connection attempt fails during a mount operation, the NFS stack needs to fail the operation. Usually user space knows how and when to retry it. The network layer does not report a distinct error code for this particular failure mode. Thus, there isn't a clean way for the RPC client to see that it needs to retry in this case, but not in others. Because NFSROOT is used in some environments where it is not possible to update the kernel command line to specify "udp", the proper thing to do is change NFSROOT to use UDP by default, as it did before commit 56463e50. To make it easier to see how to change default mount options for NFSROOT and to distinguish default settings from mandatory settings, I've adjusted a couple of areas to document the specifics. root_nfs_cat() is also modified to deal with commas properly when concatenating strings containing mount option lists. This keeps root_nfs_cat() call sites simpler, now that we may be concatenating multiple mount option strings. Tested-by: Brian Downing <bdowning@xxxxxxxxx> Tested-by: Mark Brown <broonie@xxxxxxxxxxxxxxxxxxxxxxxxxxx> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> Cc: <stable@xxxxxxxxxx> # 2.6.37 Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit 57df216bd8c8813a79a6a618e3d2ec937d532b86 Author: Huang Weiyi <weiyi.huang@xxxxxxxxx> Date: Tue Mar 8 23:11:30 2011 +0000 nfs4: remove duplicated #include Remove duplicated #include('s) in fs/nfs/nfs4proc.c Signed-off-by: Huang Weiyi <weiyi.huang@xxxxxxxxx> Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit f9feab1e180d1392f2f59d692826c6da2e57adf4 Author: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> Date: Wed Mar 9 16:12:46 2011 -0500 NFSv4: nfs4_state_mark_reclaim_nograce() should be static There are no more external users of nfs4_state_mark_reclaim_nograce() or nfs4_state_mark_reclaim_reboot(), so mark them as static. Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit ecac799a5ecc364006f0db6f2db15e77ed4d63e2 Author: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> Date: Wed Mar 9 16:00:56 2011 -0500 NFSv4: Fix the setlk error handler Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit b4410c2f7f775b03da31566c05bb8d2383c7dc27 Author: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> Date: Wed Mar 9 16:00:55 2011 -0500 NFSv4.1: Fix the handling of the SEQUENCE status bits We want SEQUENCE status bits to be handled by the state manager in order to avoid threading issues. Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit 0400a6b0cb756f976bae32ae8db47bfa9853897c Author: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> Date: Wed Mar 9 16:00:53 2011 -0500 NFSv4/4.1: Fix nfs4_schedule_state_recovery abuses nfs4_schedule_state_recovery() should only be used when we need to force the state manager to check the lease. If we just want to start the state manager in order to handle a state recovery situation, we should be using nfs4_schedule_state_manager(). This patch fixes the abuses of nfs4_schedule_state_recovery() by replacing its use with a set of helper functions that do the right thing. Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit c34c32ea97718bb24fc06158733580003ba89211 Author: Andy Adamson <andros@xxxxxxxxxx> Date: Wed Mar 9 13:13:46 2011 -0500 NFSv4.1 reclaim complete must wait for completion Signed-off-by: Andy Adamson <andros@xxxxxxxxxx> [Trond: fix whitespace errors] Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit 114f64b5f24abac33a42f4f1856eb3a9766d497e Author: Andy Adamson <andros@xxxxxxxxxx> Date: Wed Mar 9 13:13:45 2011 -0500 NFSv4: remove duplicate clientid in struct nfs_client Signed-off-by: Andy Adamson <andros@xxxxxxxxxx> Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit 7d6d63d6427090cbb1d282364b65b12634ca59bd Author: Ricardo Labiaga <Ricardo.Labiaga@xxxxxxxxxx> Date: Wed Mar 9 13:13:44 2011 -0500 NFSv4.1: Retry CREATE_SESSION on NFS4ERR_DELAY Fix bug where we currently retry the EXCHANGEID call again, eventhough we already have a valid clientid. Instead, delay and retry the CREATE_SESSION call. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@xxxxxxxxxx> Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit 4cea288aaf0e11647880cc487350b1dc45d9febc Author: Ben Hutchings <bhutchings@xxxxxxxxxxxxxx> Date: Tue Feb 22 21:54:34 2011 +0000 sunrpc: Propagate errors from xs_bind() through xs_create_sock() xs_create_sock() is supposed to return a pointer or an ERR_PTR-encoded error, but it currently returns 0 if xs_bind() fails. Signed-off-by: Ben Hutchings <bhutchings@xxxxxxxxxxxxxx> Cc: stable@xxxxxxxxxx [v2.6.37] Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit 3fa0b4e201d254b52a251fa348bd53e53000cff6 Author: Frank Filz <ffilzlnx@xxxxxxxxxx> Date: Thu Dec 2 19:31:23 2010 +0000 (try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit 31 or 63 are set in fileid The problem was use of an int32, which when converted to a uint64 is sign extended resulting in a fileid that doesn't fit in 32 bits even though the intent of the function is to fit the fileid into 32 bits. Signed-off-by: Frank Filz <ffilzlnx@xxxxxxxxxx> Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx> [Trond: Added an include for compat.h] Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit 43b7c3f051dea504afccc39bcb56d8e26c2e0b77 Author: Jovi Zhang <bookjovi@xxxxxxxxx> Date: Wed Mar 2 23:19:37 2011 +0000 nfs: fix compilation warning this commit fix compilation warning as following: linux-2.6/fs/nfs/nfs4proc.c:3265: warning: comparison of distinct pointer types lacks a cast Signed-off-by: Jovi Zhang <bookjovi@xxxxxxxxx> Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit b9f810570d9cc13177128e11a74e22d37aa68a1a Author: Stanislav Fomichev <kernel@xxxxxxxxxxx> Date: Sat Feb 5 23:13:01 2011 +0000 nfs: add kmalloc return value check in decode_and_add_ds add kmalloc return value check in decode_and_add_ds Signed-off-by: Stanislav Fomichev <kernel@xxxxxxxxxxx> Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit a5e502681007779d4762fb3ef7e80a3ecd1cfe6b Author: Jesper Juhl <jj@xxxxxxxxxxxxx> Date: Sat Jan 22 21:40:20 2011 +0000 SUNRPC: Remove resource leak in svc_rdma_send_error() We leak the memory allocated to 'ctxt' when we return after 'ib_dma_mapping_error()' returns !=0. Signed-off-by: Jesper Juhl <jj@xxxxxxxxxxxxx> Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit d2224e7afbf2a6556f4f8f25bc0e96d99ec4d2bd Author: Jeff Layton <jlayton@xxxxxxxxxx> Date: Sun Mar 6 17:14:13 2011 +0000 nfs: close NFSv4 COMMIT vs. CLOSE race I've been adding in more artificial delays in the NFSv4 commit and close codepaths to uncover races. The kernel I'm testing has the patch to close the race in __rpc_wait_for_completion_task that's in Trond's cthon2011 branch. The reproducer I've been using does this in a loop: mkdir("DIR"); fd = open("DIR/FILE", O_WRONLY|O_CREAT|O_EXCL, 0644); write(fd, "abcdefg", 7); close(fd); unlink("DIR/FILE"); rmdir("DIR"); The above reproducer shouldn't result in any silly-renaming. However, when I add a "msleep(100)" just after the nfs_commit_clear_lock call in nfs_commit_release, I can almost always force one to occur. If I can force it to occur with that, then it can happen without that delay given the right timing. nfs_commit_inode waits for the NFS_INO_COMMIT bit to clear when called with FLUSH_SYNC set. nfs_commit_rpcsetup on the other hand does not wait for the task to complete before putting its reference to it, so the last reference get put in rpc_release task and gets queued to a workqueue. In this situation, the last open context reference may be put by the COMMIT release instead of the close() syscall. The close() syscall returns too quickly and the unlink runs while the d_count is still high since the COMMIT release hasn't put its dentry reference yet. Fix this by having rpc_commit_rpcsetup wait for the RPC call to complete before putting the task reference when FLUSH_SYNC is set. With this, the last reference is put by the process that's initiating the FLUSH_SYNC commit and the race is closed. Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit bf294b41cefcb22fc3139e0f42c5b3f06728bd5e Author: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> Date: Mon Feb 21 11:05:41 2011 -0800 SUNRPC: Close a race in __rpc_wait_for_completion_task() Although they run as rpciod background tasks, under normal operation (i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck() and nfs4_do_close() want to be fully synchronous. This means that when we exit, we want all references to the rpc_task to be gone, and we want any dentry references etc. held by that task to be released. For this reason these functions call __rpc_wait_for_completion_task(), followed by rpc_put_task() in the expectation that the latter will be releasing the last reference to the rpc_task, and thus ensuring that the callback_ops->rpc_release() has been called synchronously. This patch fixes a race which exists due to the fact that rpciod calls rpc_complete_task() (in order to wake up the callers of __rpc_wait_for_completion_task()) and then subsequently calls rpc_put_task() without ensuring that these two steps are done atomically. In order to avoid adding new spin locks, the patch uses the existing waitqueue spin lock to order the rpc_task reference count releases between the waiting process and rpciod. The common case where nobody is waiting for completion is optimised for by checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task reference count is 1: in those cases we drop trying to grab the spin lock, and immediately free up the rpc_task. Those few processes that need to put the rpc_task from inside an asynchronous context and that do not care about ordering are given a new helper: rpc_put_task_async(). Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@xxxxxxxxxx www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html