Re: NFSv4/pNFS possible POSIX I/O API standards

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 1 Dec 2006, Trond Myklebust wrote:
Also, it's a tiring and trivial example, but even the 'ls -al' scenario
isn't ideally addressed by readdir()+statlite(), since statlite() might
return size/mtime from before 'ls -al' was executed by the user.

stat() will do the same.

It does with NFS, but only because NFS doesn't follow POSIX in that regard. In general, stat() is supposed to return a value that's accurate at the time of the call.

(Although now I'm confused again. If you're assuming stat() can return cached results, why do you think statlite() is useful?)

Currently, you will never get anything other than weak consistency with
NFS whether you are talking about stat(), access(), getacl(),
lseek(SEEK_END), or append(). Your 'permitting it' only in statlite() is
irrelevant to the facts on the ground: I am not changing the NFS client
caching model in any way that would affect existing applications.

Clearly, if you cache attributes on the client and provide only weak consistency, then readdirplus() doesn't change much. But _other_ non-NFS filesystems may elect to provide POSIX semantics and strong consistency, even though NFS doesn't. And the interface simply doesn't allow that to be done efficiently in distributed environments, because applications can't communicate their varying consistency needs. Instead, systems like NFS weaken attribute consistency globally. That works well enough for most people most of the time, but it's hardly ideal.

readdirplus() allows applications like 'ls -al' to distinguish themselves from applications that want individually accurate stat() results. That in turn allows distributed filesystems that are both strongly consistent _and_ efficient at scale. In most cases, it'll trivially turn into a readdir()+stat() in the VFS, but in some cases filesystems can exploit that information for (often enormous) performance gain, while still maintaining well-defined consistency semantics. readdir() already leaks some inode information into it's result (via d_type)... I'm not sure I understand the resistance to providing more.

sage
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux