Hi Linus, Please pull from the "nfs-for-3.3" branch of the repository at git pull git://git.linux-nfs.org/projects/trondmy/linux-nfs.git nfs-for-3.3 This will update the following files through the appended changesets. Cheers, Trond ---- Documentation/kernel-parameters.txt | 17 ++- fs/nfs/callback_proc.c | 2 +- fs/nfs/client.c | 12 ++- fs/nfs/file.c | 4 +- fs/nfs/idmap.c | 83 ++++++++++++++++ fs/nfs/inode.c | 2 + fs/nfs/internal.h | 2 + fs/nfs/nfs4_fs.h | 3 + fs/nfs/nfs4filelayout.c | 9 +- fs/nfs/nfs4proc.c | 177 ++++++++++++++++++----------------- fs/nfs/nfs4state.c | 104 ++++++++++++++++---- fs/nfs/nfs4xdr.c | 137 ++++++++++++++------------- fs/nfs/objlayout/objio_osd.c | 3 +- fs/nfs/objlayout/objlayout.c | 4 + fs/nfs/pnfs.c | 42 ++++++++- fs/nfs/pnfs.h | 1 + fs/nfs/super.c | 43 ++++----- fs/nfs/write.c | 27 +----- fs/nfsd/nfs4callback.c | 2 +- include/linux/nfs_fs_sb.h | 1 + include/linux/nfs_idmap.h | 8 ++ include/linux/nfs_xdr.h | 22 ++++- include/linux/sunrpc/auth.h | 3 +- include/linux/sunrpc/auth_gss.h | 2 +- include/linux/sunrpc/xdr.h | 2 + init/do_mounts.c | 35 ++++++- net/sunrpc/auth_generic.c | 6 +- net/sunrpc/auth_gss/auth_gss.c | 40 +++++---- net/sunrpc/xdr.c | 3 +- 29 files changed, 525 insertions(+), 271 deletions(-) commit 074b1d12fe2500d7d453902f9266e6674b30d84c Author: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> Date: Mon Jan 9 13:46:26 2012 -0500 NFSv4: Change the default setting of the nfs4_disable_idmapping parameter Now that the use of numeric uids/gids is officially sanctioned in RFC3530bis, it is time to change the default here to 'enabled'. By doing so, we ensure that NFSv4 copies the behaviour of NFSv3 when we're using the default AUTH_SYS authentication (i.e. when the client uses the numeric uids/gids as authentication tokens), so that when new files are created, they will appear to have the correct user/group. It also fixes a number of backward compatibility issues when migrating from NFSv3 to NFSv4 on a platform where the server uses different uid/gid mappings than the client. Note also that this setting has been successfully tested against servers that do not support numeric uids/gids at several Connectathon/Bakeathon events at this point, and the fall back to using string names/groups has been shown to work well in all those test cases. Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit 6926afd1925a54a13684ebe05987868890665e2b Author: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> Date: Sat Jan 7 13:22:46 2012 -0500 NFSv4: Save the owner/group name string when doing open ...so that we can do the uid/gid mapping outside the asynchronous RPC context. This fixes a bug in the current NFSv4 atomic open code where the client isn't able to determine what the true uid/gid fields of the file are, (because the asynchronous nature of the OPEN call denies it the ability to do an upcall) and so fills them with default values, marking the inode as needing revalidation. Unfortunately, in some cases, the VFS will do some additional sanity checks on the file, and may override the server's decision to allow the open because it sees the wrong owner/group fields. Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit e2fecb215b321db0e4a5b2597349a63c07bec42f Author: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> Date: Fri Jan 6 08:57:46 2012 -0500 NFS: Remove pNFS bloat from the generic write path We have no business doing any this in the standard write release path. Get rid of it, and put it in the pNFS layer. Also, while we're at it, get rid of the completely bogus unlock/relock semantics that were present in nfs_writeback_release_full(). It is not only unnecessary, but actually dangerous to release the write lock just in order to take it again in nfs_page_async_flush(). Better just to open code the pgio operations in a pnfs helper. Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit fe0fe83585f88346557868a803a479dfaaa0688a Author: Boaz Harrosh <bharrosh@xxxxxxxxxxx> Date: Fri Jan 6 09:31:20 2012 +0200 pnfs-obj: Must return layout on IO error As mandated by the standard. In case of an IO error, a pNFS objects layout driver must return it's layout. This is because all device errors are reported to the server as part of the layout return buffer. This is implemented the same way PNFS_LAYOUTRET_ON_SETATTR is done, through a bit flag on the pnfs_layoutdriver_type->flags member. The flag is set by the layout driver that wants a layout_return preformed at pnfs_ld_{write,read}_done in case of an error. (Though I have not defined a wrapper like pnfs_ld_layoutret_on_setattr because this code is never called outside of pnfs.c and pnfs IO paths) Without this patch 3.[0-2] Kernels leak memory and have an annoying WARN_ON after every IO error utilizing the pnfs-obj driver. [This patch is for 3.2 Kernel. 3.1/0 Kernels need a different patch] CC: Stable Tree <stable@xxxxxxxxxx> Signed-off-by: Boaz Harrosh <bharrosh@xxxxxxxxxxx> Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit 5c0b4129c07b902b27d3f3ebc087757f534a3abd Author: Boaz Harrosh <bharrosh@xxxxxxxxxxx> Date: Fri Jan 6 09:28:12 2012 +0200 pnfs-obj: pNFS errors are communicated on iodata->pnfs_error Some time along the way pNFS IO errors were switched to communicate with a special iodata->pnfs_error member instead of the regular RPC members. But objlayout was not switched over. Fix that! Without this fix any IO error is hanged, because IO is not switched to MDS and pages are never cleared or read. [Applies to 3.2.0. Same bug different patch for 3.1/0 Kernels] CC: Stable Tree <stable@xxxxxxxxxx> Signed-off-by: Boaz Harrosh <bharrosh@xxxxxxxxxxx> Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit 0aaaf5c424c7ffd6b0c4253251356558b16ef3a2 Author: Chuck Lever <chuck.lever@xxxxxxxxxx> Date: Tue Dec 6 16:13:48 2011 -0500 NFS: Cache state owners after files are closed Servers have a finite amount of memory to store NFSv4 open and lock owners. Moreover, servers may have a difficult time determining when they can reap their state owner table, thanks to gray areas in the NFSv4 protocol specification. Thus clients should be careful to reuse state owners when possible. Currently Linux is not too careful. When a user has closed all her files on one mount point, the state owner's reference count goes to zero, and it is released. The next OPEN allocates a new one. A workload that serially opens and closes files can run through a large number of open owners this way. When a state owner's reference count goes to zero, slap it onto a free list for that nfs_server, with an expiry time. Garbage collect before looking for a state owner. This makes state owners for active users available for re-use. Now that there can be unused state owners remaining at umount time, purge the state owner free list when a server is destroyed. Also be sure not to reclaim unused state owners during state recovery. This change has benefits for the client as well. For some workloads, this approach drops the number of OPEN_CONFIRM calls from the same as the number of OPEN calls, down to just one. This reduces wire traffic and thus open(2) latency. Before this patch, untarring a kernel source tarball shows the OPEN_CONFIRM call counter steadily increasing through the test. With the patch, the OPEN_CONFIRM count remains at 1 throughout the entire untar. As long as the expiry time is kept short, I don't think garbage collection should be terribly expensive, although it does bounce the clp->cl_lock around a bit. [ At some point we should rationalize the use of the nfs_server ->destroy method. ] Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> [Trond: Fixed a garbage collection race and a few efficiency issues] Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit 414adf14cd3b52e411f79d941a15d0fd4af427fc Author: Chuck Lever <chuck.lever@xxxxxxxxxx> Date: Tue Dec 6 16:13:39 2011 -0500 NFS: Clean up nfs4_find_state_owners_locked() There's no longer a need to check the so_server field in the state owner, because nowadays the RB tree we search for state owners contains owners for that only server. Make nfs4_find_state_owners_locked() use the same tree searching logic as nfs4_insert_state_owner_locked(). Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit bf118a342f10dafe44b14451a1392c3254629a1f Author: Andy Adamson <andros@xxxxxxxxxx> Date: Wed Dec 7 11:55:27 2011 -0500 NFSv4: include bitmap in nfsv4 get acl data The NFSv4 bitmap size is unbounded: a server can return an arbitrary sized bitmap in an FATTR4_WORD0_ACL request. Replace using the nfs4_fattr_bitmap_maxsz as a guess to the maximum bitmask returned by a server with the inclusion of the bitmap (xdr length plus bitmasks) and the acl data xdr length to the (cached) acl page data. This is a general solution to commit e5012d1f "NFSv4.1: update nfs4_fattr_bitmap_maxsz" and fixes hitting a BUG_ON in xdr_shrink_bufhead when getting ACLs. Fix a bug in decode_getacl that returned -EINVAL on ACLs > page when getxattr was called with a NULL buffer, preventing ACL > PAGE_SIZE from being retrieved. Cc: stable@xxxxxxxxxx Signed-off-by: Andy Adamson <andros@xxxxxxxxxx> Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit 3476f114addb7b96912840a234702f660a1f460b Author: Chris Metcalf <cmetcalf@xxxxxxxxxx> Date: Thu Aug 11 13:54:28 2011 -0700 nfs: fix a minor do_div portability issue This change modifies filelayout_get_dense_offset() to use the functions in math64.h and thus avoid a 32-bit platform compile error trying to use do_div() on an s64 type. Signed-off-by: Chris Metcalf <cmetcalf@xxxxxxxxxx> Reviewed-by: Boaz Harrosh <bharrosh@xxxxxxxxxxx> Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit 0b1c8fc43c1f9fcde2d18182988f05eeaaae509b Author: Andy Adamson <andros@xxxxxxxxxx> Date: Wed Nov 9 13:58:26 2011 -0500 NFSv4.1: cleanup comment and debug printk Signed-off-by: Andy Adamson <andros@xxxxxxxxxx> Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit aabd0b40b327d5c6518c8c908819b9bf864ad56a Author: Andy Adamson <andros@xxxxxxxxxx> Date: Wed Nov 9 13:58:22 2011 -0500 NFSv4.1: change nfs4_free_slot parameters for dynamic slots Signed-off-by: Andy Adamson <andros@xxxxxxxxxx> Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit aacd5537270a752fe12a9914a207284fc2341c6d Author: Andy Adamson <andros@xxxxxxxxxx> Date: Wed Nov 9 13:58:21 2011 -0500 NFSv4.1: cleanup init and reset of session slot tables We are either initializing or resetting a session. Initialize or reset the session slot tables accordingly. Signed-off-by: Andy Adamson <andros@xxxxxxxxxx> Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit 61f2e5106582d02f30b6807e3f9c07463c572ccb Author: Andy Adamson <andros@xxxxxxxxxx> Date: Wed Nov 9 13:58:20 2011 -0500 NFSv4.1: fix backchannel slotid off-by-one bug Cc:stable@xxxxxxxxxx Signed-off-by: Andy Adamson <andros@xxxxxxxxxx> Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit 8a0d551a59ac92d8ff048d6cb29d3a02073e81e8 Author: Jeff Layton <jlayton@xxxxxxxxxx> Date: Tue Dec 20 06:57:45 2011 -0500 nfs: fix regression in handling of context= option in NFSv4 Setting the security context of a NFSv4 mount via the context= mount option is currently broken. The NFSv4 codepath allocates a parsed options struct, and then parses the mount options to fill it. It eventually calls nfs4_remote_mount which calls security_init_mnt_opts. That clobbers the lsm_opts struct that was populated earlier. This bug also looks like it causes a small memory leak on each v4 mount where context= is used. Fix this by moving the initialization of the lsm_opts into nfs_alloc_parsed_mount_data. Also, add a destructor for nfs_parsed_mount_data to make it easier to free all of the allocations hanging off of it, and to ensure that the security_free_mnt_opts is called whenever security_init_mnt_opts is. I believe this regression was introduced quite some time ago, probably by commit c02d7adf. Cc: stable@xxxxxxxxxxxxxxx Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit 2edb6bc3852c681c0d948245bd55108dc6407604 Author: NeilBrown <neilb@xxxxxxx> Date: Wed Nov 16 11:46:31 2011 +1100 NFS - fix recent breakage to NFS error handling. From c6d615d2b97fe305cbf123a8751ced859dca1d5e Mon Sep 17 00:00:00 2001 From: NeilBrown <neilb@xxxxxxx> Date: Wed, 16 Nov 2011 09:39:05 +1100 Subject: [PATCH] NFS - fix recent breakage to NFS error handling. commit 02c24a82187d5a628c68edfe71ae60dc135cd178 made a small and presumably unintended change to write error handling in NFS. Previously an error from filemap_write_and_wait_range would only be of interest if nfs_file_fsync did not return an error. After this commit, an error from filemap_write_and_wait_range would mean that (the rest of) nfs_file_fsync would not even be called. This means that: 1/ you are more likely to see EIO than e.g. EDQUOT or ENOSPC. 2/ NFS_CONTEXT_ERROR_WRITE remains set for longer so more writes are synchronous. This patch restores previous behaviour. Cc: stable@xxxxxxxxxx Cc: Josef Bacik <josef@xxxxxxxxxx> Cc: Jan Kara <jack@xxxxxxx> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx> Signed-off-by: NeilBrown <neilb@xxxxxxx> Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit 43717c7daebf10b43f12e68512484b3095bb1ba5 Author: Chuck Lever <chuck.lever@xxxxxxxxxx> Date: Mon Dec 5 15:40:30 2011 -0500 NFS: Retry mounting NFSROOT Lukas Razik <linux@xxxxxxxxxx> reports that on his SPARC system, booting with an NFS root file system stopped working after commit 56463e50 "NFS: Use super.c for NFSROOT mount option parsing." We found that the network switch to which Lukas' client was attached was delaying access to the LAN after the client's NIC driver reported that its link was up. The delay was longer than the timeouts used in the NFS client during mounting. NFSROOT worked for Lukas before commit 56463e50 because in those kernels, the client's first operation was an rpcbind request to determine which port the NFS server was listening on. When that request failed after a long timeout, the client simply selected the default NFS port (2049). By that time the switch was allowing access to the LAN, and the mount succeeded. Neither of these client behaviors is desirable, so reverting 56463e50 is really not a choice. Instead, introduce a mechanism that retries the NFSROOT mount request several times. This is the same tactic that normal user space NFS mounts employ to overcome server and network delays. Signed-off-by: Lukas Razik <linux@xxxxxxxxxx> [ cel: match kernel coding style, add proper patch description ] [ cel: add exponential back-off ] Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> Tested-by: Lukas Razik <linux@xxxxxxxxxx> Cc: stable@xxxxxxxxxx # > 2.6.38 Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> commit 68c97153fb7f2877f98aa6c29546381d9cad2fed Author: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> Date: Tue Jan 3 13:22:46 2012 -0500 SUNRPC: Clean up the RPCSEC_GSS service ticket requests Instead of hacking specific service names into gss_encode_v1_msg, we should just allow the caller to specify the service name explicitly. Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> Acked-by: J. Bruce Fields <bfields@xxxxxxxxxx> -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@xxxxxxxxxx www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html