Re: [PATCH 01/19] fs: new API for handling inode->i_version

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 13 2017, Jeff Layton wrote:

> On Thu, 2017-12-14 at 09:04 +1100, NeilBrown wrote:
>> On Wed, Dec 13 2017, Jeff Layton wrote:
>> 
>> > +/*
>> > + * The change attribute (i_version) is mandated by NFSv4 and is mostly for
>> > + * knfsd, but is also used for other purposes (e.g. IMA). The i_version must
>> > + * appear different to observers if there was a change to the inode's data or
>> > + * metadata since it was last queried.
>> > + *
>> > + * It should be considered an opaque value by observers. If it remains the same
>> > + * since it was last checked, then nothing has changed in the inode. If it's
>> > + * different then something has changed. Observers cannot infer anything about
>> > + * the nature or magnitude of the changes from the value, only that the inode
>> > + * has changed in some fashion.
>> 
>> I agree that it "should be" considered opaque, but I have a suspicion
>> that NFSv4 doesn't consider it opaque.
>> There is something about write delegations and the server performing a
>> GETATTR callback to the delegated client so that it can answer GETATTR
>> from other clients without recalling the delegation.
>> 
>> Specifically section "10.4.3 Handling of CB_GETATTR" of RFC5661 contains
>> the text:
>> 
>>    o  The client will create a value greater than c that will be used
>>       for communicating that modified data is held at the client.  Let
>>       this value be represented by d.
>> 
>> "c" here is a 'change' attribute.
>> 
>> Then:
>> 
>>    While the change attribute is opaque to the client in the sense that
>>    it has no idea what units of time, if any, the server is counting
>>    change with, it is not opaque in that the client has to treat it as
>>    an unsigned integer, and the server has to be able to see the results
>>    of the client's changes to that integer.  Therefore, the server MUST
>>    encode the change attribute in network order when sending it to the
>>    client.  The client MUST decode it from network order to its native
>>    order when receiving it, and the client MUST encode it in network
>>    order when sending it to the server.  For this reason, change is
>>    defined as an unsigned integer rather than an opaque array of bytes.
>> 
>> This all suggests that nfsd needs to be certain that "incrementing" the
>> change id will produce a new changeid, which has not been used before,
>> and also suggests that nfsd needs to be able to control the changeid
>> stored after writes that result from a delegation being returned.
>> 
>> I'd just like to say that this is one of the most annoying dumb features
>> of NFSv4, because it is trivial to fix and I suggested a fix before
>> NFSv4.0 was finalized.  Grumble.
>> 
>> Otherwise the patch set looks good.  I haven't gone over the code
>> closely, the but approach is spot-on.
>
> I don't think we have to do that. There are really only two states with
> a client holding a write delegation, as far as the server is concerned.
> Either:
>
> a) the client has done no writes to the file, in which case it'll return
> the same i_version that the server has when issued a CB_GETATTR
>
> ...or...
>
> b) it has written to the file while holding the delegation, in which
> case it'll return a different CB_GETATTR to the server
>
> The simplest thing for the server to do is to just increment the change
> attribute _once_ when it gets back a CB_GETATTR with a different change
> attr than it has.
>
> That's sufficient to tell another client issuing a a GETATTR that the
> file has changed without needing to recall the delegation.
>
> Prior to the delegation being returned, the client will send at least
> one WRITE RPC, and that's enough to ensure that the the next stat will
> see the thing increase.

"increment" and "increase" are not words that mean anything for an
"opaque value".
NFSd is, presumably, an "observer" of i_version (as it isn't the
filesytem that controls it), so your text says it must treat i_version as
opaque.  That means it cannot detect an "increase" (only a change), and
it certainly cannot "increment" the value.

I think you need to allow observers to treat i_version as a 64 bit number
which will monotonically increase.  Any change to the file will result
in an increment of at least '1'.

Thanks,
NeilBrown

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux