Re: [PATCH] NFSD: Return full WCC data for NFSv3 metadata operations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 08, 2010 at 11:43:46AM -0400, Chuck Lever wrote:
> NFSv3 WCC data, or "weak cache consistency" data, is an attempt to
> reduce the number of on-the-wire transactions needed by NFS clients
> to keep their caches up to date.  WCC data consists of two parts:
> 
>   o  pre-op data, which is a subset of file metadata as it was before
>      a procedure starts, and
> 
>   o  post-op data, which is a full set of NFSv3 file attribute data as
>      it is after a procedure finishes.
> 
> For an NFSv3 procedure that returns wcc_data, an NFSv3 server is free
> to return either, both, or neither of these.  Define "full WCC data"
> as a reply that contains both pre-op and post-op data.
> 
> To make this data useful, a server must ensure that file metadata is
> captured atomically around the requested operation.  If the pre-op
> data in a reply matches the file metadata that the client already has
> cached, the client can assume that no other operation on that file
> occurred while the server was fulfilling the current request, and that
> therefore the post-op metadata is the latest version, and can be
> cached.
> 
> Conversely, NFSv3 clients invalidate their metadata caches when they
> receive replies to metadata altering operations that do not contain
> full WCC data.  When a server presents a reply that does not have both
> pre-op and post-op WCC data, clients must employ extra LOOKUP and
> GETATTR requests to ensure their metadata caches are up to date,
> causing performance to suffer needlessly.  For example, untarring a
> large tar file can take almost an order of magnitude longer in this
> case, depending on the client implementation.
> 
> In the Linux NFS server implementation, to ensure that WCC data
> reflects only changes made during the current file system operation,
> the file's inode mutex is held in order to serialize metadata altering
> operations on that inode.  Our server saves pre-op data for a file
> handle just after the target inode's mutex is taken, and saves post-op
> data just before the inode's mutex is dropped (see fh_lock() and
> fh_unlock()).
> 
> In order to return full WCC data to clients, our server must have both
> the saved pre-op and the saved post-op attribute data for a file
> handle filled in before it starts XDR encoding the reply.
> Unfortunately, for the NFSv3 MKDIR, MKNOD, REMOVE, RMDIR procedures,
> our server does not unlock the parent directory inode until well after
> the reply has been XDR encoded.
> 
> In these cases, encode_wcc_data() does have saved pre-op WCC data
> available, since the fh is locked, but does not have saved post-op WCC
> data for the parent directory, since it hasn't yet been unlocked.  In
> this situation, encode_wcc_data() simply grabs the parent's current
> metadata, uses that as the post-op WCC data, and returns no pre-op
> WCC data to the client.
> 
> By instead unlocking the parent directory file handle immediately
> after the internal operations for each of these NFS procedures is
> complete, saved post-op WCC data for the file handle is filled in
> before XDR encoding starts, so full WCC data for that procedure can
> be returned to clients.
> 
> Note that the NFSv4 CREATE and REMOVE procedures already invoke
> fh_unlock() explicitly on the parent directory in order to fill in the
> NFSv4 post change attribute.
> 
> Note also that NFSv3 CREATE, RENAME, SETATTR, and SYMLINK already
> perform explicit file handle unlocking.
> 
> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
> ---
> 
> Bruce-
> 
> This patch is mechanically the same as the previous one, but the patch
> description has a more accurate and clearly stated rationale for the
> change.
> 
> Please use this one instead of the previous one.

Thanks.  The first is already committed, though.  I'm not sure if
there's a good place for the longer explanation in the docs, so it may
just have to live on in the list archives. (It's the testcase (untarring
a linux kernel from an OS X client) that I was mainly curious about.)

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux