Re: NFSv4/pNFS possible POSIX I/O API standards

Steven Whitehouse <swhiteho@xxxxxxxxxx> · Wed, 29 Nov 2006 10:25:07 +0000

Hi,

On Wed, 2006-11-29 at 01:48 -0800, Andreas Dilger wrote:
> On Nov 29, 2006  09:04 +0000, Christoph Hellwig wrote:
> >  - readdirplus
> > 
> > 	This one is completely unneeded as a kernel API.  Doing readdir
> > 	plus calls on the wire makes a lot of sense and we already do
> > 	that for NFSv3+.  Doing this at the syscall layer just means
> > 	kernel bloat - syscalls are very cheap.
> 
> The question is how does the filesystem know that the application is
> going to do readdir + stat every file?  It has to do this as a heuristic
> implemented in the filesystem to determine if the ->getattr() calls match
> the ->readdir() order.  If the application knows that it is going to be
> doing this (e.g. ls, GNU rm, find, etc) then why not let the filesystem
> take advantage of this information?  If combined with the statlite
> interface, it can make a huge difference for clustered filesystems.
> 
> Cheers, Andreas

I agree that this is a good plan, but I'd been looking at this idea from
a different direction recently. The in kernel NFS server calls
vfs_getattr from its filldir routine for readdirplus and this means not
only are we unable to optimise performance by (for example) sorting
groups of getattr calls so that we read the inodes in disk block order,
but also that its effectively enforcing a locking order of the inodes on
us too. Since we can have async locking in GFS2, we should be able to do
"lockahead" with readdirplus too.

I had been considering proposing a readdirplus export operation, but
since this thread has come up, perhaps a file operation would be
preferable as it could solve two problems with one operation?

Steve.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html