Re: [RFC][Resend] Make NFS-Client readahead tunable

Peter Staubach <staubach@xxxxxxxxxx> · Thu, 18 Sep 2008 15:03:01 -0400

Chuck Lever wrote:
On Thu, Sep 18, 2008 at 6:53 AM, Martin Knoblauch <knobi@xxxxxxxxxxxx> wrote:

----- Original Message ----

From: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
To: Martin Knoblauch <knobi@xxxxxxxxxxxx>
Cc: Greg Banks <gnb@xxxxxxxxxxxxxxxxx>; linux-nfs list <linux-nfs@xxxxxxxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx; Peter zijlstra <a.p.zijlstra@xxxxxxxxx>
Sent: Thursday, September 18, 2008 10:47:33 AM
Subject: Re: [RFC][Resend] Make NFS-Client readahead tunable

On Thu, 18 Sep 2008 01:38:57 -0700 (PDT) Martin Knoblauch
wrote:

No.  mount(8) will pass unrecognised options straight down into the
filesystem driver.

 Has that always been the case, or is it a recent change? I have to support

RHEL4 userland, which is not really new.

It's been that way for ever and ever.  It's how all these guys:

y:/usr/src/25> grep Opt_ fs/*/super.c|wc
    781    2626   33703

get handled.

 while that seems to be not to complicated, I seem to have a problem passing the mount options to the kernel. They come down as mount data version "6". Apparently mount(8) or mount.nfs(8) are doing the parsing and send down the legacy data block. So, what is the minimum version of mount or mount.nfs that pass the options down unaltered?

The mount command has passed a string of options to the kernel for
particular file systems for a while, but the facility for the NFS
client to parse a string of mount options in the kernel was added only
recently -- at least 2.6.23 or 2.6.24 is required to support this.
Before this, the mount command parsed these options.

For RHEL 4, based on 2.6.9, you are stuck.  It uses a binary structure
whose fields must match between the kernel and user space.  For RH
enterprise kernels, the ABI cannot change in a given release, so RH
wouldn't take a patch to change the data structure that mount uses.
You would have to maintain such a change yourself, and build your own
kernels and mount command after each RHEL 4 update is released.

I agree that a mount option would allow more fine-grained control over
readahead.  A system wide parameter controlling readahead has always
been a weakness.  Readahead, as implemented in the VFS, has a
*per-file descriptor* context, however, which operates automatically
(and can be tuned at run-time by an application with [mf]advise(2).

As a future feature, this might work in better combination with the
per-mount bdi changes proposed by Peter to provide maximal flexibility
without exposing yet another confusing knob that could help some
workloads but hurt others.

And perhaps add some dynamic tuning capabilities to the NFS client
code to just make it do "the right thing".  This would be better
than any tunables and would help to serve in other situations, such
as high bandwidth/latency networks, overloaded servers who don't
need more read-ahead READ requests piled on, etc...

      ps
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html