Re: RFC: new attributes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Aug 05, 2023 at 03:05:59PM -0700, Rick Macklem wrote:
> On Sat, Aug 5, 2023 at 7:51 AM Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote:
> >
> > Hi Rick -
> >
> > > On Aug 4, 2023, at 6:18 PM, Rick Macklem <rick.macklem@xxxxxxxxx> wrote:
> > >
> > > Hi,
> > >
> > > I wrote an IETF draft proposing a few new attributes for NFSv4.2.
> > > Since there did not seem to be interest in them, I just
> > > let the draft expire.  However, David Noveck pinged
> > > me w.r.t. it, so I thought I'd ask here about it.
> > >
> > > All the attributes are meant to be "read only, per server file system":
> > > supported_ops - A bitmap of the operations supported.
> > >     The motivation was that NFS4ERR_NOTSUPP is supposed to
> > >      be "per server", although the rumour was that the Linux knfsd
> > >      uses it "per server file system".
> >
> > Before crafting new protocol, we should have a look at server
> > implementation behavior to see if it can be improved in this
> > area.
> >
> > Is Linux the only problematic implementation? Send email,
> > bug reports, or patches... we'll consider them.
> >
> This was discussed on the IETF working group mailing list some years
> ago.  I was asking if NFS4ERR_NOTSUPP could be used "per server
> file system" or "per server". Tom Haynes said his intent was "per server",
> but that was not clear in the RFC.  The only place in any RFC where it
> seemed to indicate "per server" was the definition for NFS4ERR_NOTSUPP
> as follows:
> 15.1.1.5.  NFS4ERR_NOTSUPP (Error Code 10004)
> 
>    Operation not supported, either because the operation is an OPTIONAL
>    one and is not supported by this server or because the operation MUST
>    NOT be implemented in the current minor version.
> 
> Bruce Fields noted that he thought the Linux knfsd was doing NFS4ERR_NOTSUPP
> w.r.t. optional 4.2 operations on a "per file system basis" and there was some
> mumbling to the effect that it should be applicable "per server file system".
> 
> In FreeBSD, certain 4.2 operations (such as Allocate) can only be done
> on certain file systems.
> Without a way to indicate to a client that this operation is supported on
> file system X but not file system Y, the server is forced to not support the
> operation. (It is currently controlled by a tunable that a sysadmin could
> set incorrectly and result in NFS4ERR_NOTSUPP for some of the file
> systems. As such, you could say that the FreeBSD server can do this.)
> 
> I do not have a Linux server with various types of file systems to confirm
> if what Bruce Fields thought was the case is actually the case.

I wondered if the subtle difference between "per-server" and "per-
server filesystem" would have implications for extensibility (RFC
8178) but I don't see anything specific there. It distinguishes
between

  NFS4ERR_ILLEGAL - operation is not valid for this minor version

and

  NFS4ERR_NOTSUPP - operation is valid for this minor version but is
  not supported by this implementation

Because the specs are not clear about this, a client could remember
the support status of an operation and not use that operation at all
when a server shares a mix of filesystems that support a feature and
filesystems that do not.

But it looks like we have a de facto deviation from what was intended
but not written in the protocol specs -- NFS4ERR_NOTSUPP is used by
some implementations on a per-filesystem basis (not that I've actually
audited the Linux server implementation yet).

I would find documenting this de facto interpretation more palatable
than adding a new bitmask attribute.


> > > dir_cookie_rising - Only useful for directory delegations, which no
> > >      one seems to be implementing.
> >
> > We've been talking privately and informally about implementing
> > directory delegation in the Linux NFS server, so this one
> > could be interesting. But there aren't enough details here to
> > know whether this new attribute would be useful to us.
> >
> I wrote a bunch of code implementing directory delegations on FreeBSD,
> but never completed the work to the point of testing.
> I found that, for FreeBSD, it was infeasible to implement the client side
> for server file systems where the directory offset cookie was not monotonically
> increasing. (Basically maintaining ordering of "directory chunks" because too
> difficult with being able to do so based on directory offset cookie ordering.)
> 
> So, my implementation, if even completed, would only work for the case of
> monotonically increasing directory offset cookies and detecting that that is
> not the case "on the fly" would have been messy.

I'm still not clear on what "monotonically increasing directory
offsets" means.

I'm familiar with only a couple of Linux filesystems, and they seem
to use distinct offset values that increase monotonically as new
entries are created in the directory. The offset values do not
confer any particular information about entry locality.


> > > seek_granularity - The smallest size of unallocated region reported
> > >      be the Seek operation.  FreeBSD has a pathconf(2) variable called
> > >      _PC_MIN_HOLE_SIZE that an application can use to decide if
> > >      lseek(SEEK_DATA/SEEK_HOLE) is useful.

I checked. On Linux, fpathconf(3) does not list a MIN_HOLE_SIZE
variable, fwiw.


> > I'm not aware of a scenario where the Linux server would provide
> > a value not equal to 1, so it would be easy for us to implement.
> A value of 1 is of limited use. If an application is going to use the
> information (btw, I think this pathconf name was in Solaris?), the
> size (such as 32K or 128K) can be more useful.
> --> No point in doing Seek if the data is not sufficiently sparse.

To provide an implementation there just needs to be a clear use case
for it and a clear explanation for the semantics of that value
provided by an open specification.


> > What would clients do with this information, aside from filling
> > in a pathconf field? Might this value be of benefit for READ_PLUS?
> As proposed, it does not give indications of "sparseness" for individual
> files. It would, however, indicate if READ_PLUS can be useful.
> --> If the server returns 0, there is no point in performing READ_PLUS.
> (This was not explicitly stated in the draft, but should be.)

That's where I'm thinking the benefit might be for clients that
implement READ_PLUS but do not care about MIN_HOLE_SIZE.


> > > max_xattr_len - Allows the client to avoid attempting to Setxattr an
> > >     attribute that is larger than the server file system supports.
> >
> > Can you elaborate on the problem you are trying to solve? Why
> > isn't the current situation adequate?
> For FreeBSD, an application can attempt to set a very large extended
> attribute (I've actually done a 1Gbyte one on ZFS during testing).
> As such, for NFSv4.2 it can attempt one up to the maximum allowable
> size for a compound (a little over 1Mbyte).
> 
> Without this attribute, most servers will fail with NFS4ERR_XATTR2BIG,
> resulting in a fair amount of "on the wire" traffic for some application
> that insists on doing large ones a lot.
> 
> This attribute would allow the client to avoid putting Setxattr operations
> with too large an attribute "on the wire", but it is a minor optimization.

The Linux server restricts the maximum size of RPC messages during
CREATE_SESSION, so it wouldn't be possible to write more than about
a megabyte at maximum to an xattr residing on Linux. But I believe
many of our local filesystem implementations do not support xattrs
much larger than 64KB. This limit very likely varies depending on
the filesystem.

IMO large xattrs were not in the minds of the authors of RFC 8276.
If applications want to store large amounts of data in secondary
byte streams, they should use named attributes instead, since a
named attribute I believe is written with WRITE, which enables
extending the attribute size after it's been created.

I remember also there was a question of write atomicity when we
discussed this before. Atomicity and large writes generally do not
mix for the most common Linux filesystem implementations.



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux