Trond Myklebust wrote:
On Tue, 2006-12-05 at 10:07 +0000, Christoph Hellwig wrote:
...and we have pointed out how nicely this ignores the realities of
current caching models. There is no need for a readdirplus() system
call. There may be a need for a caching barrier, but AFAICS that is all.
I think Andreas mentioned that it is useful for clustered filesystems
that can avoid additional roundtrips this way. That alone might now
be enough reason for API additions, though. The again statlite and
readdirplus really are the most sane bits of these proposals as they
fit nicely into the existing set of APIs. The filehandle idiocy on
the other hand is way of into crackpipe land.
They provide no benefits whatsoever for the two most commonly used
networked filesystems NFS and CIFS. As far as they are concerned, the
only new thing added by readdirplus() is the caching barrier semantics.
I don't see why you would want to add that into a generic syscall like
readdir() though: it is
a) networked filesystem specific. The mask stuff etc adds no
value whatsoever to actual "posix" filesystems. In fact it is
telling the kernel that it can violate posix semantics.
It isn't violating POSIX semantics if we get the calls passed as an
extension to POSIX :).
b) quite unnatural to impose caching semantics on all the
directory _entries_ using a syscall that refers to the directory
itself (see the explanations by both myself and Peter Staubach
of the synchronisation difficulties). Consider in particular
that it is quite possible for directory contents to change in
between readdirplus calls.
I want to make sure that I understand this correctly. NFS semantics
dictate that if someone stat()s a file that all changes from that client
need to be propagated to the server? And this call complicates that
semantic because now there's an operation on a different object (the
directory) that would cause this flush on the files?
Of course directory contents can change in between readdirplus() calls,
just as they can between readdir() calls. That's expected, and we do not
attempt to create consistency between calls.
i.e. the "strict posix caching model' is pretty much impossible
to implement on something like NFS or CIFS using these
semantics. Why then even bother to have "masks" to tell you when
it is OK to violate said strict model.
We're trying to obtain improved performance for distributed file systems
with stronger consistency guarantees than these two.
c) Says nothing about what should happen to non-stat() metadata
such as ACL information and other extended attributes (for
example future selinux context info). You would think that the
'ls -l' application would care about this.
Honestly, we hadn't thought about other non-stat() metadata because we
didn't think it was part of the use case, and we were trying to stay
close to the flavor of POSIX. If you have ideas here, we'd like to hear
them.
Thanks for the comments,
Rob
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html