Re: NFSv4/pNFS possible POSIX I/O API standards

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

The use model for openg() and openfh() (renamed sutoc()) is n processes spread across a large cluster simultaneously opening a file. The challenge is to avoid to the greatest extent possible incurring O(n) FS interactions. To do that we need to allow actions of one process to be reused by other processes on other OS instances.

The openg() call allows one process to perform name resolution, which is often the most expensive part of this use model. Because permission checking is also performed as part of the openg(), some file systems to not require additional communication between OS and FS at openfh(). External communication channels are used to pass the handle resulting from the openg() call out to processes on other nodes (e.g. MPI_Bcast).

dup(), openat(), and UNIX sockets are not viable options in this model, because there are many OS instances, not just one.

All the calls that are being discussed as part of the HEC extensions are being discussed in this context of multiple OS instances and cluster file systems.

Regarding the lifetime of the handle, there has been quite a bit of discussion about this. I believe that we most recently were thinking that there was an undefined lifetime for this, allowing servers to "forget" these values (as in the case where a server is restarted). Clients would need to perform the openg() again if they were to try to use an outdated handle, or simply fall back to a regular open(). This is not a problem in our use model.

I've attached a graph showing the time to use individual open() calls vs. the openg()/MPI_Bcast()/openfh() combination; it's a clear win for any significant number of processes. These results are from our colleagues at Sandia (Ruth Klundt et. al.) with PVFS underneath, but I expect the trend to be similar for many cluster file systems.

Regarding trying to "force APIs using standardization" on you (Christoph's 11/29/2006 message), you've got us all wrong. The standardization process is going to take some time, so we're starting on it at the same time that we're working with prototypes, so that we don't have to wait any longer than necessary to have these things be part of POSIX. The whole reason we're presenting this on this list is to try to describe why we think these calls are important and get feedback on how we can make these calls work well in the context of Linux. I'm glad to see so many people taking interest.

I look forward to further constructive discussion. Thanks,

Rob
---
Rob Ross
Mathematics and Computer Science Division
Argonne National Laboratory

Christoph Hellwig wrote:
On Wed, Nov 29, 2006 at 05:23:13AM -0700, Matthew Wilcox wrote:
Is this for people who don't know about dup(), or do they need
independent file offsets?  If the latter, I think an xdup() would be
preferable (would there be a security issue for OSes with revoke()?)
Either that, or make the key be useful for something else.

Not sharing the file offset means we need a separate file struct, at
which point the only thing saved is doing a lookup at the time of
opening the file.  While a full pathname traversal can be quite costly
an open is not something you do all that often anyway.  And if you really
need to open/close files very often you can speed it up nicely by keeping
a file descriptor on the parent directory open and use openat().

Anyway, enough of talking here.  We really need a very good description
of the use case people want this for, and the specific performance problems
they see to find a solution.  And the solution definitly does not involve
as second half-assed file handle time with unspecified lifetime rules :-)

Attachment: openg-compare.pdf
Description: Adobe PDF document


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux