hi Bob and others
I found on the Red Hat 108 Developer Portal the following GFS1/GFS2
design document which details amongst others, some of the issues with
NFS on GFS:
https://rpeterso.108.redhat.com/servlets/ProjectDocumentView?documentID=99
(I see it was sent to this list over a year ago, but I never found it
while searching through the archives. it has a lot of good information
in it)
It has a disclaimer: Some of the comments
are no longer applicable due to design changes
My question to you or anyone who is familiar with NFS on GFS, or GFS in
general, which of the following are still valid issues for the current
(6.1u4) version of GFS. If all or most of them still apply, I can use
this as motivation for my customer to strongly consider going off NFS on
GFS. Removing the NFS from our GFS cluster has been on the cards for
quite a while, but has not gained momentum due to lack of information on
the performance gains of such a move (very difficult to gage) or the
architectural problems/limitations of NFS on GFS (for which the
following extract is spot-on).
Note - can you consider adding a link to this document from your FAQ?
+++++++++
o NFS Support
A GFS filesystem can be exported through NFS to other
nodes. There
are a number of issues with NFS on top of a cluster
filesystem,
though.
1) Filehandle misses
When a NFS request comes into the server, it's up to
the filesystem
(and a few Linux helper routines) to map the NFS
filehandle to the
correct inode. Doing that is easy if the inode is
already in the
node's cache. The tricky part is when the
filesystem must read in
the inode from the disk. There is nothing in the
filehandle that
anchors the inode into the filesystem (such as a
glock on a
directory that contains an entry pointing to the
inode), so a lot
more care has to taken to make sure the block really
contains a
valid inode. (See the section on the proposed new
RG formats.)
It's also non-trivial to handle inode migration in
GFS when a NFS
server is running. There is no centralized data
structure that can
map filehandles into inodes (such a structure would be a
scalability/performance bottleneck). It's difficult
to find a
representation of the inode that could be used to
quickly find it
even in the face of the inode changing blocks.
Another problem is that filehandle requests can come
in random
times for inodes that don't exist anymore or are in
the process of
being recreated. This can break optimizations based
on ideas like
"since this node in the process of creating this
inode, it are
the only one who knows about its locks". GFS has
suffered from
these mis-optimizations in the past. From what I've
seen, I believe
OCFS2 currently has problems like this, too.
2) Readdir
Linux has an interesting mechanism to do handle
readdir() requests.
The VFS (or NFSD) passes the filesystem a request
containing not
only the directory and offset to be read, but a
filldir function to
call for each entry found. So, the filesystem
doesn't directly
fill in a buffer of entries, but calls an arbitrary
routine that
can either put the entries into a buffer or do some
other type of
processing on them. This is a powerful concept, but
can be easily
misused.
I believe that NFSD's use of it is problematic at
best. The
filldir routine used by NFSD for the readdirplus NFS
procedure
calls back into the filesystem to do a lookup and
stat() on the
inode pointed to by the entry. This call is painful
because of
GFS' locking. gfs_readdir() must call filldir with
the directory
lock held so that it doesn't lose its place in the
directory. The
stat() that the filldir routine does causes the
inode's lock to be
acquired. Because concurrent inode locks must
always be acquired
in ascending numerical order and the filldir routine
forces an
ordering that might be something other than that,
there is a
deadlock potential.
GFS detects when NFSD calls its readdir and switches
to a routine
that avoids calling the filldir routine with the
lock held. It's
not as efficient, but it avoids the deadlock. It'd
be nice if
there was a better way to do the detection, though.
(The code
currently looks at the program's name.)
3) FCNTL locking
There are a huge number of issues with acquiring and
failing over
fcntl()-style locks when there are multiple GFS
heads exporting
NFS. GFS pretty much ignores them right now. A
good place to
start would be to change NFSD so it actually passes
fcntl calls
down into the filesystem.
4) NFSv4
NFSv4 requires all sorts of changes to GFS in order
for them to
work together. Op locks being one I can remember at
the moment.
I think I've repressed my memories of the others.
++++++++
begin:vcard
fn:Riaan van Niekerk
n:van Niekerk;Riaan
org:Obsidian Systems;Obsidian Red Hat Consulting
email;internet:riaan@xxxxxxxxxxxxxx
title:Systems Architect
tel;work:+27 11 792 6500
tel;fax:+27 11 792 6522
tel;cell:+27 82 921 8768
x-mozilla-html:FALSE
url:http://www.obsidian.co.za
version:2.1
end:vcard
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster