On Mon, 4 Dec 2006, Peter Staubach wrote:
I think that there are several points which are missing here. First, readdirplus(), without any sort of caching, is going to be _very_ expensive, performance-wise, for _any_ size directory. You can see this by instrumenting any NFS server which already supports the NFSv3 READDIRPLUS semantics.
Are you referring to the work the server must do to gather stat information for each inode?
Second, the NFS client side readdirplus() implementation is going to be _very_ expensive as well. The NFS client does write-behind and all this data _must_ be flushed to the server _before_ the over the wire READDIRPLUS can be issued. This means that the client will have to step through every inode which is associated with the directory inode being readdirplus()'d and ensure that all modified data has been successfully written out. This part of the operation, for a sufficiently large directory and a sufficiently large page cache, could take signficant time in itself.
Why can't the client send the over the wire READDIRPLUS without flushing inode data, and then simply ignore the stat portion of the server's response in instances where it's locally cached (and dirty) inode data is newer than the server's?
These overheads may make this new operation expensive enough that no applications will end up using it.
If the application calls readdirplus() only when it would otherwise do readdir()+stat(), the flushing you mention would happen anyway (from the stat()). Wouldn't this at least allow that to happen in parallel for the whole directory?
sage - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html