On Fri, 1 Dec 2006, Trond Myklebust wrote:
What exactly do you mean by an "atomic readdirplus"? Standard readdir is by its very nature weakly cached, and there is no guarantee whatsoever even that you will see all files in the directory. See the SuSv3 definition, which explicitly states that there is no ordering w.r.t. file creation/deletion: The type DIR, which is defined in the <dirent.h> header, represents a directory stream, which is an ordered sequence of all the directory entries in a particular directory. Directory entries represent files; files may be removed from a directory or added to a directory asynchronously to the operation of readdir().
I mean atomic only in the sense that the stat result returned by readdirplus() would reflect the file state at some point during the time consumed by that system call. In contrast, when you call stat() separately, it's expected that the result you get back reflects the state at some time during the stat() call, and not the readdir() that may have preceeded it. readdir() results may be weakly cached, but stat() results normally aren't (ignoring the usual NFS behavior for the moment).
It's the stat() part of readdir() + stat() that makes life unnecessarily difficult for a filesystem providing strong consistency. How can the filesystem know that 'ls' doesn't care if the stat() results are accurate at the time of the readdir() and not the subsequent stat()? Something like readdirplus() allows that to be explicitly communicated, without resorting to heuristics or weak metadata consistency (ala NFS attribute caching). For distributed or network filesystems that can be a big win. (Admittedly, there's probably little benefit for local filesystems beyond the possibility of better prefetching, if syscalls are as cheap as Christoph says.)
Besides, why would your application care about atomicity of the attribute information unless you also have some form of locking to guarantee that said information remains valid until you are done processing it?
Something like 'ls' certainly doesn't care, but in general applications do care that stat() results aren't cached. They expect the stat results to reflect the file's state at a point in time _after_ they decide to call stat(). For example, for process A to see how much data a just-finished process B wrote to a file...
sage - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html