Re: Asking information about NFS /CS4 Cookbook

Wendy Cheng <wcheng@xxxxxxxxxx> · Thu, 24 May 2007 10:00:43 -0400

Fajar A. Nugraha wrote:

Hi Wendy,

Please help me go through this summary from the bugzilla

Before we complete the work, for NFS v2/V3, RHEL 4.4 has the following
restrictions:

==> Is this still valid for RHEL 4.5 and RHEL5?

NFS failover most likely will work, except the documented corner cases. 
Our customers normally find the restrictions workable. One thing has to 
be made clear is that these are all inherent linux kernel issues. RHCS 
has been doing a good job to workaround a large portion of them. 
Occasionally you'll find ESTABLE or EPERM though. The fixes didn't make 
into RHEL 4.5 nor RHEL 5.

B-1: Unless NFS client applications can tolerate ESTALE and/or EPERM errors,
    IO activities on the failover ip interface must be temporarily quiesced
    until active-active failover transition completes. This is to avoid
    non-idempotent NFS operation failure on the new server. (check out
    "Why NFS Sucks" by Olaf Kirch, placed as "kirch-reprint.pdf" in 2006
    OLS proceeding).

==> What does this mean, exactly? For example, does this mean that I
should not use RHCS-nfs-mounted storage for
busy-accessed-all-the-time-web-servers because I'd likely get
ESTALE/EPERM during failover?

NFS V2/V3 failover has been a difficult subject regardless which 
platform you're on. Assume a flawless failover is a naive assumption. 
NFS V4 (where NFS client is required to play a helping role) is 
developed to remedy the issues.

B-2: With various possible base kernel bugs outside RHCS' control, there
    are possibilities that local filesystem (such as ext3) umount could
    fail. To ensure data integrity, RHCS will abort the failover. Admin
    could specify the self-fence (reboot taken-over server) option
    to force failover (via cluster.conf file).

==> In short, it'd be better using GFS, right?

GFS certainly works better in this arena.

B-3: If nfs client invokes NLM locking call, the subject nfs servers (both
    taken-over and take-over) will enter a global 90-second (tunable)
    locking grace period for every nfs service on the servers.

==> What does "locking grace" mean? Does it mean read-write access
allowed but no locks, or no acess at all?

If it is a new lock request, the lock call will hang until grace period 
is over. This is to allow existing lock holders to reclaim their locks. 
This has been part of the NFS-NLM protocol. Read and write can keep 
going without restrictions.

B-4: If NFS-TCP is involved, failover should not be issued on the same pair
    of machines multiple times within 30-minute period; for example,
    failing over from node A to B, then immediately failing from B back to
    A would hang the connection. This is to avoid TCP TIME_WAIT issue.

==> So what does this mean currently in TCP vs UDP world? Does it mean
nfs v3 UDP is the preferred method?

No. TCP is definitely a better protocol. Read the sentence carefully - 
"failing over from node A to B, then immediately failing from B back to 
A again will hang the connection".

-- Wendy

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster