Re: NFSv4/pNFS possible POSIX I/O API standards

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 1 Dec 2006, Trond Myklebust wrote:
What exactly do you mean by an "atomic readdirplus"? Standard readdir is
by its very nature weakly cached, and there is no guarantee whatsoever
even that you will see all files in the directory. See the SuSv3
definition, which explicitly states that there is no ordering w.r.t.
file creation/deletion:

       The type DIR, which is defined in the <dirent.h> header,
       represents a directory stream, which is an ordered sequence of
       all the directory entries in a particular directory. Directory
       entries represent files; files may be removed from a directory
       or added to a directory asynchronously to the operation of
       readdir().

I mean atomic only in the sense that the stat result returned by readdirplus() would reflect the file state at some point during the time consumed by that system call. In contrast, when you call stat() separately, it's expected that the result you get back reflects the state at some time during the stat() call, and not the readdir() that may have preceeded it. readdir() results may be weakly cached, but stat() results normally aren't (ignoring the usual NFS behavior for the moment).

It's the stat() part of readdir() + stat() that makes life unnecessarily difficult for a filesystem providing strong consistency. How can the filesystem know that 'ls' doesn't care if the stat() results are accurate at the time of the readdir() and not the subsequent stat()? Something like readdirplus() allows that to be explicitly communicated, without resorting to heuristics or weak metadata consistency (ala NFS attribute caching). For distributed or network filesystems that can be a big win. (Admittedly, there's probably little benefit for local filesystems beyond the possibility of better prefetching, if syscalls are as cheap as Christoph says.)

Besides, why would your application care about atomicity of the
attribute information unless you also have some form of locking to
guarantee that said information remains valid until you are done
processing it?

Something like 'ls' certainly doesn't care, but in general applications do care that stat() results aren't cached. They expect the stat results to reflect the file's state at a point in time _after_ they decide to call stat(). For example, for process A to see how much data a just-finished process B wrote to a file...

sage

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux