Re: Read/Write NFS I/O performance degraded by FLUSH_STABLE page flushing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Apr 30, 2009, at 4:41 PM, Peter Staubach wrote:

Chuck Lever wrote:

On Apr 30, 2009, at 4:12 PM, Brian R Cowan wrote:

Hello all,

This is my first post, so please be gentle.... I have been working
with a
customer who is attempting to build their product in ClearCase dynamic
views on Linux. When they went from Red hat Enterprise Linux 4
(update 5)
to Red Hat Enterprise Linux 5 (Update 2), their build performance
degraded
dramatically. When troubleshooting the issue, we noticed that links on
RHEL 5 caused an incredible number of "STABLE" 4kb nfs writes even
though
the storage we were writing to was EXPLICITLY mounted async. (This made
RHEL 5 nearly 5x slower than RHEL 4.5 in this area...)

On consultation with some internal resources, we found this change in
the
2.6 kernel:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=ab0a3dbedc51037f3d2e22ef67717a987b3d15e2


In here it looks like the NFS client is forcing sync writes any time a
write of less than the NFS write size occurs. We tested this
hypothesis by
setting the write size to 2KB. The "STABLE" writes went away and link times came back down out of the stratosphere. We built a modified kernel
based on the RHEL 5.2 kernel (that ONLY backed out of this change)
and we
got a 33% improvement in overall build speeds. In my case, I see almost
identical build times between the 2 OS's when we use this modified
kernel
on RHEL 5.

Now, why am I posing this to the list? I need to understand *why* that
change was made. On the face of it, simply backing out that patch
would be
perfect. I'm paranoid. I want to make sure that this is the ONLY reason:
"/* For single writes, FLUSH_STABLE is more efficient */ "

It seems more accurate to say that they *aren't* more efficient, but
rather are "safer, but slower."

They are more efficient from the point of view that only a single RPC
is needed for a complete write.  The WRITE and COMMIT are done in a
single request.

I don't think the issue here is whether the write is stable, but it is
whether the NFS client has to block the application for it.  A stable
write that is asynchronous to the application is faster than
WRITE+COMMIT.

So it's not "stable" that is holding you up, it's "synchronous."
Those are orthogonal concepts.


Actually, the "stable" part can be a killer.  It depends upon
why and when nfs_flush_inode() is invoked.

I did quite a bit of work on this aspect of RHEL-5 and discovered
that this particular code was leading to some serious slowdowns.
The server would end up doing a very slow FILE_SYNC write when
all that was really required was an UNSTABLE write at the time.

If the client is asking for FILE_SYNC when it doesn't need the COMMIT, then yes, that would hurt performance.

Did anyone actually measure this optimization and if so, what
were the numbers?

   Thanx...

      ps

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux