Re: [PATCH 2/2] NFSv4: Allow per-mount tuning of READDIR attrs

Benjamin Coddington <bcodding@xxxxxxxxxx> · Wed, 18 Oct 2023 10:24:18 -0400

On 18 Oct 2023, at 9:33, Jeff Layton wrote:

> On Wed, 2023-10-18 at 08:56 -0400, Chuck Lever wrote:
>> On Tue, Oct 17, 2023 at 05:30:44PM -0400, Benjamin Coddington wrote:
>>> Expose a per-mount knob in sysfs to set the READDIR requested attributes
>>> for a non-plus READDIR request.
>>>
>>> For example:
>>>
>>>   echo 0x800 0x800000 0x0 > /sys/fs/nfs/0\:57/v4_readdir_attrs
>>>
>>> .. will revert the client to only request rdattr_error and
>>> mounted_on_fileid for any non "plus" READDIR, as before the patch
>>> preceeding this one in this series.  This provides existing installations
>>> an option to fix a potential performance regression that may occur after
>>> NFS clients update to request additional default READDIR attributes.
>>>
>>> Signed-off-by: Benjamin Coddington <bcodding@xxxxxxxxxx>
>>> ---
>>>  fs/nfs/client.c           |  2 +
>>>  fs/nfs/nfs4client.c       |  4 ++
>>>  fs/nfs/nfs4proc.c         |  1 +
>>>  fs/nfs/nfs4xdr.c          |  7 ++--
>>>  fs/nfs/sysfs.c            | 81 +++++++++++++++++++++++++++++++++++++++
>>>  include/linux/nfs_fs_sb.h |  1 +
>>>  include/linux/nfs_xdr.h   |  1 +
>>>  7 files changed, 93 insertions(+), 4 deletions(-)
>>
>> Admittedly, it would be much easier for humans to use if the API was
>> based on the symbolic names of the bits rather than a triplet of raw
>> hexadecimal values.

This isn't aiming to be an ease-of-use interface.  This is tinkering with
the innards of the client.  If you're doing this, you better know how to
convert between bases, because you're going to need that and more.

If we want to make it nice, patches to nfsctl can follow.

> I think there are some significant footguns with this interface. It'd be
> very easy to set this wrong and get weird behavior.  OTOH, we could push
> that complexity into userland and provide some sort of script in nfs-
> utils for tuning this.
>
> That said...
>
> When we look at interfaces like this, we have to consider that they may
> be around for a long, long time (decades, even), and people will come to
> rely on them to do strange things that are difficult for us to support.
> If we have someone saying that their READDIR performance slowed down, we
> now have to grab those settings from this sysfs file and validate them
> when trying to help them.
>
> Personally, I'd prefer a simple binary "make it work the old way"
> switch, if we're concerned about performance regressions here. I think
> that's the sort of thing that is simple to explain to admins that are
> suffering from this problem and (more importantly) the sort of setting
> we can later remove when it's no longer needed.
>
> Adding this sort of fine-grained knob will create more problems than it
> solves, as people will (inevitably) use it incorrectly.

I disagree that it will create more problems than it solves.

Also, sysfs isn't there for you to experiment with in production, and
sysadmins know this.  Sysfs is "_The_ filesystem for exporting kernel
objects".   There are plenty of ways to hose a system and corrupt data by
playing around with sysfs.

If we take the position that everything in NFS' sysfs must have a higher
standard of safety than even module parameters (see recover_lost_locks),
that means we better look at making every sysfs interface non-shoot-footy,
which is just insane.  Just take a look at a sampling of writeable files,
here's a couple:

/sys/block/sda/device/delete
/sys/kernel/sunrpc/xprt-switches/switch-1/xprt-0-local/dstaddr

Ben