On Thu, 30 Nov 2006, Christoph Hellwig wrote:
On Wed, Nov 29, 2006 at 12:26:22AM -0800, Brad Boyer wrote:
For a more extreme case, hfs and hfsplus don't even have a separation
between directory entries and inode information. The code creates this
separation synthetically to match the expectations of the kernel. During
a readdir(), the full catalog record is loaded from disk, but all that
is used is the information passed back to the filldir callback. The only
thing that would be needed to return extra information would be code to
copy information from the internal structure to whatever the system call
used to return data to the program.
In this case you can infact already instanciate inodes froms readdir.
Take a look at the NFS code.
Sure. And having readdirplus over the wire is a great performance win for
NFS, but it works only because NFS metadata consistency is already weak.
Giving applications an atomic readdirplus makes things considerably
simpler for distributed filesystems that want to provide strong
consistency (and a reasonable interpretation of what POSIX semantics mean
for a distributed filesystem). In particular, it allows the application
(e.g. ls --color or -al) to communicate to the kernel and filesystem that
it doesn't care about the relative ordering of each subsequent stat() with
respect to other writers (possibly on different hosts, with whom
synchronization can incur a heavy performance penalty), but rather only
wants a snapshot of dentry+inode state.
As Andreas already mentioned, detecting this (exceedingly common) case may
be possible with heuristics (e.g. watching the ordering of stat() calls vs
the filldir resuls), but that's hardly ideal when a cleaner interface can
explicitly capture the application's requirements.
sage
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html