Re: Read/Write NFS I/O performance degraded by FLUSH_STABLE page flushing

Peter Staubach <staubach@xxxxxxxxxx> · Thu, 30 Apr 2009 16:41:12 -0400

Chuck Lever wrote:
>
> On Apr 30, 2009, at 4:12 PM, Brian R Cowan wrote:
>
>> Hello all,
>>
>> This is my first post, so please be gentle.... I have been working
>> with a
>> customer who is attempting to build their product in ClearCase dynamic
>> views on Linux. When they went from Red hat Enterprise Linux 4
>> (update 5)
>> to Red Hat Enterprise Linux 5 (Update 2), their build performance
>> degraded
>> dramatically. When troubleshooting the issue, we noticed that links on
>> RHEL 5 caused an incredible number of "STABLE" 4kb nfs writes even
>> though
>> the storage we were writing to was EXPLICITLY mounted async. (This made
>> RHEL 5 nearly 5x slower than RHEL 4.5 in this area...)
>>
>> On consultation with some internal resources, we found this change in
>> the
>> 2.6 kernel:
>>
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=ab0a3dbedc51037f3d2e22ef67717a987b3d15e2
>>
>>
>> In here it looks like the NFS client is forcing sync writes any time a
>> write of less than the NFS write size occurs. We tested this
>> hypothesis by
>> setting the write size to 2KB. The "STABLE" writes went away and link
>> times came back down out of the stratosphere. We built a modified kernel
>> based on the RHEL 5.2 kernel (that ONLY backed out of this change)
>> and we
>> got a 33% improvement in overall build speeds. In my case, I see almost
>> identical build times between the 2 OS's when we use this modified
>> kernel
>> on RHEL 5.
>>
>> Now, why am I posing this to the list? I need to understand *why* that
>> change was made. On the face of it, simply backing out that patch
>> would be
>> perfect. I'm paranoid. I want to make sure that this is the ONLY reason:
>> "/* For single writes, FLUSH_STABLE is more efficient */ "
>>
>> It seems more accurate to say that they *aren't* more efficient, but
>> rather are "safer, but slower."
>
> They are more efficient from the point of view that only a single RPC
> is needed for a complete write.  The WRITE and COMMIT are done in a
> single request.
>
> I don't think the issue here is whether the write is stable, but it is
> whether the NFS client has to block the application for it.  A stable
> write that is asynchronous to the application is faster than
> WRITE+COMMIT.
>
> So it's not "stable" that is holding you up, it's "synchronous." 
> Those are orthogonal concepts.
>

Actually, the "stable" part can be a killer.  It depends upon
why and when nfs_flush_inode() is invoked.

I did quite a bit of work on this aspect of RHEL-5 and discovered
that this particular code was leading to some serious slowdowns.
The server would end up doing a very slow FILE_SYNC write when
all that was really required was an UNSTABLE write at the time.

Did anyone actually measure this optimization and if so, what
were the numbers?

    Thanx...

       ps
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html