Re: NFSv4/pNFS possible POSIX I/O API standards

Gary Grider <ggrider@xxxxxxxx> · Sun, 17 Dec 2006 20:54:13 -0700

At 07:57 PM 12/17/2006, Ragnar Kjørstad wrote:
On Sun, Dec 17, 2006 at 01:51:38PM -0800, Ulrich Drepper wrote:
> Matthew Wilcox wrote:
> >I know that the rsync load is a major factor on kernel.org right now.
>
> That should be quite easy to quantify then.  Move the readdir and stat
> call next to each other in the sources, pass the struct stat around if
> necessary, and then count the stat calls which do not originate from the
> stat following the readdir call.  Of course we'll also need the actual
> improvement which can be achieved by combining the calls.  Given the
> inodes are cached, is there more overhead then finding the right inode?
>  Note that is rsync doesn't already use fstatat() it should do so and
> this means then that there is no long file path to follow, all file
> names are local to the directory opened with opendir().
>
> My but feeling is that the improvements are minimal for normal (not
> cluster etc) filesystems and hence the improvements for kernel.org would
> be minimal.

I don't think the overhead of finding the right inode or the system
calls themselves makes a difference at all. E.g. the rsync numbers I
listed spend more than 0.3ms per stat syscall. That kind of time is not
spent in looking up kernel datastructures - it's spent doing IO.

That part that I think is important (and please correct me if I've
gotten it all wrong) is to do the IO in parallel. This applies both to
local filesystems and clustered filesystems, allthough it would probably
be much more significant for clustered filesystems since they would
typically have longer latency for each roundtrip.  Today there is no good
way for an application to stat many files in parallel. You could do it
through threading, but with significant overhead and complexity.

I'm curious what results one would get by comparing performance of:
* application doing readdir and then stat on every single file
* application doing readdirplus
* application doing readdir and then stat on every file using a lot of
  threads or an asyncronous stat interface

We have done something similar to what you suggest.
We wrote a parallel file tree walker to run on 
clustered file systems that spread the file systems
metadata out over multiple disks.  The program 
parallelizes the stat operations across multiple
nodes (via MPI). We needed to walk a tree with 
about a hundred million files in a reasonable amount of time.
We cut the time from dozens of hours to less than 
an hour.  We were able to keep all the metadata
raids/disks much busier doing the work for the 
stat operations.  We have used this on two
different clustered file systems with similar 
results. In both cases, it scaled with the number
of disks the metadata was spread over, not quite 
linearly but it was a huge win for these two
file systems.

Gary

As far as parallel IO goes, I would think that async stat would be
nearly as fast as readdirplus?
For the clustered filesystem case there may be locking issues that makes
readdirplus faster?

--
Ragnar Kjørstad
Software Engineer
Scali - http://www.scali.com
Scaling the Linux Datacenter

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html