Re: Why must NFS access metadata in synchronous mode?

Trond Myklebust <trond.myklebust@xxxxxxxxxx> · Thu, 01 Jun 2006 13:26:18 -0400

On Thu, 2006-06-01 at 12:27 -0400, Xin Zhao wrote:
> Question 1: ...and how many NFS implementations have you seen based on
> that paper?
> I don't know. I only read the NFS implementations distributed with
> Linux kernel. But some paper mentioned that the soft update mechanism
> suggested in that paper has been adopted by FreeBSD.

FreeBSD does not use soft updates for NFS afaik.

> Question 2: NFS permissions are checked by the _server_, not the client.
> That's true. But I was not saying that all metadata access must be
> asynchronous. Even for permission checking, speculative execution
> mechanism proposed in Ed Nightingale's "speculative execution ...."
> paper published in SOSP 2005 can be used to avoid waiting. The basic
> idea is that a NFS client speculatively assume permission checking
> returns "OK" and set a checkpoint, then the client can go ahead to
> send further requests. If the actual result turns out to be "OK", the
> client can discard the checkpoint, otherwise, it rolls back to the
> checking point. This can make waiting time overlap with the sending
> time of subsequent requests.

...and how does that help the user that has been told the operation
succeeded?

> Question 3: Cache consistency requirements are _much_ more stringent
> for asynchronous operation.
> I agree. But I am not sure how local file system like Ext3 handle this
> problem. I don't think Ext3 must synchronously write metadata (I will
> double check the ext3 code). If I remember correctly, when change
> metadata, Ext3 just change it in memory and mark this page to be
> dirty. The page will be flushed to disk afterward. If the server
> exports an Ext3 code, it should be able to do the same thing. When a
> client requests to change metadata, server writes to the mmaped
> metadata page and then return to client instead of having to sync the
> change to disk. With this mechanism, at least the client does not have
> to wait for the disk flush time. Does it make sense? To prevent
> interleave change on metadata before it is flushed to disk, the server
> can even mark the metadata page to be read-only before it is flushed
> to disk.

'man 5 exports'. Read _carefully_ the entry on the "async" export
option, and see the NFS FAQ, nfs mailing list archives, etc... why it is
a bad idea.

Cheers,
  Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html