On Fri, 1 Dec 2006, Trond Myklebust wrote:
'ls --color' and 'find' don't give a toss about most of the arguments
from 'stat()'. They just want to know what kind of filesystem object
they are dealing with. We already provide that information in the
readdir() syscall via the 'd_type' field. Adding all the other stat()
information is just going to add unnecessary synchronisation burdens.
'ls -al' cares about the stat() results, but does not care about the
relative timing accuracy wrt the preceeding readdir(). I'm not sure why
'ls --color' still calls stat when it can get that from the readdir()
results, but either way it's asking more from the kernel/filesystem than
it needs.
Something like 'ls' certainly doesn't care, but in general applications do
care that stat() results aren't cached. They expect the stat results to
reflect the file's state at a point in time _after_ they decide to call
stat(). For example, for process A to see how much data a just-finished
process B wrote to a file...
AFAICS, it will not change any consistency semantics. The main
irritation it will introduce will be that the NFS client will suddenly
have to do things like synchronising readdirplus() and file write() in
order to provide the POSIX guarantees that you mentioned.
i.e: if someone has written data to one of the files in the directory,
then an NFS client will now have to flush that data out before calling
readdir so that the server returns the correct m/ctime or file size.
Previously, it could delay that until the stat() call.
It sounds like you're talking about a single (asynchronous) client in a
directory. In that case, the client need only flush if someone calls
readdirplus() instead of readdir(), and since readdirplus() is effectively
also a stat(), the situation isn't actually any different.
The more interesting case is multiple clients in the same directory. In
order to provide strong consistency, both stat() and readdir() have to
talk to the server (or more complicated leasing mechanisms are needed).
In that scenario, readdirplus() is asking for _less_
synchronization/consistency of results than readdir()+stat(), not more.
i.e. both the readdir() and stat() would require a server request in order
to achieve the standard POSIX semantics, while a readdirplus() would allow
a single request. The NFS client already provibes weak consistency of
stat() results for clients. Extending the interface doesn't suddenly
require the NFS client to provide strong consistency, it just makes life
easier for the implementation if it (or some other filesystem) chooses to
do so.
Consider two use cases. Process A is 'ls -al', who doesn't really care
about when the size/mtime are from (i.e. sometime after opendir()).
Process B waits for a process on another host to write to a file, and then
calls stat() locally to check the result. In order for B to get the
correct result, stat() _must_ return a value for size/mtime from _after_
the stat() initiated. That makes 'ls -al' slow, because it probably has
to talk to the server to make sure files haven't been modified between the
readdir() and stat(). In reality, 'ls -al' doesn't care, but the
filesystem has no way to know that without the presense of readdirplus().
Alternatively, an NFS (or other distributed filesystem) client can cache
file attributes to make 'ls -al' fast, and simply break process B (as NFS
currently does). readdirplus() makes it clear what 'ls -al' doesn't need,
allowing the client (if it so chooses) to avoid breaking B in the general
case. That simply isn't possible to explicitly communicate with the
existing interface. How is that not a win?
I imagine that most of the time readdirplus() will hit something in the
VFS that simply calls readdir() and stat(). But a smart NFS (or other
network filesytem) client can can opt to send a readdirplus over the wire
for readdirplus() without sacrificing stat() consistency in the general
case.
sage
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html