On Thu, 2006-11-30 at 23:08 -0800, Sage Weil wrote: > I mean atomic only in the sense that the stat result returned by > readdirplus() would reflect the file state at some point during the time > consumed by that system call. In contrast, when you call stat() > separately, it's expected that the result you get back reflects the state > at some time during the stat() call, and not the readdir() that may > have preceeded it. readdir() results may be weakly cached, but stat() > results normally aren't (ignoring the usual NFS behavior for the moment). > > It's the stat() part of readdir() + stat() that makes life unnecessarily > difficult for a filesystem providing strong consistency. How can the > filesystem know that 'ls' doesn't care if the stat() results are accurate > at the time of the readdir() and not the subsequent stat()? Something > like readdirplus() allows that to be explicitly communicated, without > resorting to heuristics or weak metadata consistency (ala NFS attribute > caching). For distributed or network filesystems that can be a big win. > (Admittedly, there's probably little benefit for local filesystems beyond > the possibility of better prefetching, if syscalls are as cheap as > Christoph says.) 'ls --color' and 'find' don't give a toss about most of the arguments from 'stat()'. They just want to know what kind of filesystem object they are dealing with. We already provide that information in the readdir() syscall via the 'd_type' field. Adding all the other stat() information is just going to add unnecessary synchronisation burdens. > > Besides, why would your application care about atomicity of the > > attribute information unless you also have some form of locking to > > guarantee that said information remains valid until you are done > > processing it? > > Something like 'ls' certainly doesn't care, but in general applications do > care that stat() results aren't cached. They expect the stat results to > reflect the file's state at a point in time _after_ they decide to call > stat(). For example, for process A to see how much data a just-finished > process B wrote to a file... AFAICS, it will not change any consistency semantics. The main irritation it will introduce will be that the NFS client will suddenly have to do things like synchronising readdirplus() and file write() in order to provide the POSIX guarantees that you mentioned. i.e: if someone has written data to one of the files in the directory, then an NFS client will now have to flush that data out before calling readdir so that the server returns the correct m/ctime or file size. Previously, it could delay that until the stat() call. Trond - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html