On Fri, 2006-12-01 at 08:47 -0800, Sage Weil wrote: > On Fri, 1 Dec 2006, Trond Myklebust wrote: > > 'ls --color' and 'find' don't give a toss about most of the arguments > > from 'stat()'. They just want to know what kind of filesystem object > > they are dealing with. We already provide that information in the > > readdir() syscall via the 'd_type' field. Adding all the other stat() > > information is just going to add unnecessary synchronisation burdens. > > 'ls -al' cares about the stat() results, but does not care about the > relative timing accuracy wrt the preceeding readdir(). I'm not sure why > 'ls --color' still calls stat when it can get that from the readdir() > results, but either way it's asking more from the kernel/filesystem than > it needs. > > >> Something like 'ls' certainly doesn't care, but in general applications do > >> care that stat() results aren't cached. They expect the stat results to > >> reflect the file's state at a point in time _after_ they decide to call > >> stat(). For example, for process A to see how much data a just-finished > >> process B wrote to a file... > > > > AFAICS, it will not change any consistency semantics. The main > > irritation it will introduce will be that the NFS client will suddenly > > have to do things like synchronising readdirplus() and file write() in > > order to provide the POSIX guarantees that you mentioned. > > > > i.e: if someone has written data to one of the files in the directory, > > then an NFS client will now have to flush that data out before calling > > readdir so that the server returns the correct m/ctime or file size. > > Previously, it could delay that until the stat() call. > > It sounds like you're talking about a single (asynchronous) client in a > directory. In that case, the client need only flush if someone calls > readdirplus() instead of readdir(), and since readdirplus() is effectively > also a stat(), the situation isn't actually any different. > > The more interesting case is multiple clients in the same directory. In > order to provide strong consistency, both stat() and readdir() have to > talk to the server (or more complicated leasing mechanisms are needed). Why would that be interesting? What applications do you have that require strong consistency in that scenario? I keep looking for uses for strong cache consistency with no synchronisation, but I have yet to meet someone who has an actual application that relies on it. > In that scenario, readdirplus() is asking for _less_ > synchronization/consistency of results than readdir()+stat(), not more. > i.e. both the readdir() and stat() would require a server request in order > to achieve the standard POSIX semantics, while a readdirplus() would allow > a single request. The NFS client already provibes weak consistency of > stat() results for clients. Extending the interface doesn't suddenly > require the NFS client to provide strong consistency, it just makes life > easier for the implementation if it (or some other filesystem) chooses to > do so. I'm quite happy with a proposal for a statlite(). I'm objecting to readdirplus() because I can't see that it offers you anything useful. You haven't provided an example of an application which would clearly benefit from a readdirplus() interface instead of readdir()+statlite() and possibly some tools for managing cache consistency. > Consider two use cases. Process A is 'ls -al', who doesn't really care > about when the size/mtime are from (i.e. sometime after opendir()). > Process B waits for a process on another host to write to a file, and then > calls stat() locally to check the result. In order for B to get the > correct result, stat() _must_ return a value for size/mtime from _after_ > the stat() initiated. That makes 'ls -al' slow, because it probably has > to talk to the server to make sure files haven't been modified between the > readdir() and stat(). In reality, 'ls -al' doesn't care, but the > filesystem has no way to know that without the presense of readdirplus(). > Alternatively, an NFS (or other distributed filesystem) client can cache > file attributes to make 'ls -al' fast, and simply break process B (as NFS > currently does). readdirplus() makes it clear what 'ls -al' doesn't need, > allowing the client (if it so chooses) to avoid breaking B in the general > case. That simply isn't possible to explicitly communicate with the > existing interface. How is that not a win? Using readdir() to monitor size/mtime on individual files is hardly a case we want to optimise for. There are better tools, including inotify() for applications that care. I agree that an interface which allows a userland process offer hints to the kernel as to what kind of cache consistency it requires for file metadata would be useful. We already have stuff like posix_fadvise() etc for file data, and perhaps it might be worth looking into how you could devise something similar for metadata. If what you really want is for applications to be able to manage network filesystem cache consistency, then why not provide those tools instead? Cheers, Trond - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html