On Tue, 5 Dec 2006, Christoph Hellwig wrote:
Readdir plus is a little more involved. For one thing the actual kernel
implementation will be a variant of getdents() call anyway while a
readdirplus would only be a library level interface. At the actual
C prototype level I would rename d_stat_err to d_stat_errno for consistency
and maybe drop the readdirplus() entry point in favour of readdirplus_r
only - there is no point in introducing new non-reenetrant APIs today.
Also we should not try to put in any of the synchronization or non-caching
behaviours mentioned earlier in this thread (they're fortunately not in
the pdf mentioned above either). If we ever want to implement these kinds
of additional gurantees (which I doubt) that should happen using fadvise
calls on the diretory file descriptor.
Can you explain what the struct stat result portion of readdirplus()
should mean in this case?
My suggestion was that its consistency follow that of the directory entry
(i.e. mimic the readdir() specification), which (as far as the POSIX
description goes) means it is at least as recent as opendir(). That model
seems to work pretty well for readdir() on both local and network
filesystems, as it allows buffering and so forth. This is evident from
the fact that it's semantics haven't been relaxed by NFS et al (AFAIK).
Alternatively, one might specify that the result be valid at the time of
the readdirplus() call, but I think everyone agrees that is unnecessary,
and more importantly, semantically indistinguishable from a
readdir()+stat().
The only other option I've heard seems to be that the validity of stat()
not be specified at all. This strikes me as utterly pointless--why create
a call whose result has no definition. It's also semantically
indistinguishable from a readdir()+statlite(null mask).
The fact that NFS and maybe others returned cached results for stat()
doesn't seem relevant to how the call is _defined_. If the definition of
stat() followed NFS, then it might read something like "returns
information about a file that was accurate at some point in the last 30
seconds or so." On the other hand, if readdirplus()'s stat consistency is
defined the same way as the dirent, NFS et al are still free to ignore
that specification and return cached results, as they already do for
stat(). (A 'lite' version of readdirplus() might even let users pick and
choose, should the fs support both behaviors, just like statlite().) I
don't really care what NFS does, but if readdirplus() is going to be
specified at all, it should be defined in a way that makes sense and has
some added semantic value.
Also, one note about the fadvise() suggestion. I think there's a key
distinction between what fadvise() currently does (provide hints to the
filesystem for performance optimization) and your proposal, which would
allow it to change the consistency semantics of other calls. That might
be fine, but it strikes me as a slightly strange thing to specify new
functionality that redefines previously defined semantics--even to realign
with popular implementations.
sage
P.S. I should probably mention that I'm not part of the group working on
this proposal. I've just been following their progress as it relates to
my own distributed filesystem research.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html