Re: Read/Write NFS I/O performance degraded by FLUSH_STABLE page flushing

Chuck Lever <chuck.lever@xxxxxxxxxx> · Thu, 30 Apr 2009 16:28:24 -0400

On Apr 30, 2009, at 4:12 PM, Brian R Cowan wrote:

Hello all,

This is my first post, so please be gentle.... I have been working  
with a
customer who is attempting to build their product in ClearCase dynamic
views on Linux. When they went from Red hat Enterprise Linux 4  
(update 5)
to Red Hat Enterprise Linux 5 (Update 2), their build performance  
degraded
dramatically. When troubleshooting the issue, we noticed that links on
RHEL 5 caused an incredible number of "STABLE" 4kb nfs writes even  
though
the storage we were writing to was EXPLICITLY mounted async. (This  
made
RHEL 5 nearly 5x slower than RHEL 4.5 in this area...)

On consultation with some internal resources, we found this change  
in the
2.6 kernel:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=ab0a3dbedc51037f3d2e22ef67717a987b3d15e2

In here it looks like the NFS client is forcing sync writes any time a
write of less than the NFS write size occurs. We tested this  
hypothesis by
setting the write size to 2KB. The "STABLE" writes went away and link
times came back down out of the stratosphere. We built a modified  
kernel
based on the RHEL 5.2 kernel (that ONLY backed out of this change)  
and we
got a 33% improvement in overall build speeds. In my case, I see  
almost
identical build times between the 2 OS's when we use this modified  
kernel
on RHEL 5.

Now, why am I posing this to the list? I need to understand *why* that
change was made. On the face of it, simply backing out that patch  
would be
perfect. I'm paranoid. I want to make sure that this is the ONLY  
reason:
"/* For single writes, FLUSH_STABLE is more efficient */ "

It seems more accurate to say that they *aren't* more efficient, but
rather are "safer, but slower."

They are more efficient from the point of view that only a single RPC  
is needed for a complete write.  The WRITE and COMMIT are done in a  
single request.

I don't think the issue here is whether the write is stable, but it is  
whether the NFS client has to block the application for it.  A stable  
write that is asynchronous to the application is faster than WRITE 
+COMMIT.

So it's not "stable" that is holding you up, it's "synchronous."   
Those are orthogonal concepts.

I know that this is a 3+ year old update, but RHEL 4 is based on a 2.4
kernel,

Nope, RHEL 4 is 2.6.9.  RHEL 3 is 2.4.20-ish.

and SLES 9 is based on something in the same ballpark. And our
customers see problems when they go to SLES 10/RHEL 5 from the prior  
major
distro version.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html