Re: [PATCH v4 5/8] NFSD check stateids against copy stateids

Olga Kornievskaia <olga.kornievskaia@xxxxxxxxx> · Thu, 1 Aug 2019 10:12:11 -0400

On Wed, Jul 31, 2019 at 5:51 PM J. Bruce Fields <bfields@xxxxxxxxxx> wrote:
>
> On Wed, Jul 31, 2019 at 05:10:01PM -0400, Olga Kornievskaia wrote:
> > I'm having difficulty with this patch because there is no good way to
> > know when the copy_notify stateid can be freed. What I can propose is
> > to have the linux client send a FREE_STATEID with the copy_notify
> > stateid and use that as the trigger to free the state. In that case,
> > I'll keep a reference on the parent until the FREE_STATEID is
> > received.
> >
> > This is not in the spec (though seems like a good idea to tell the
> > source server it's ok to clean up) so other implementations might not
> > choose this approach so we'll have problems with stateids sticking
> > around.
>
> https://tools.ietf.org/html/rfc7862#page-71
>
>         "If the cnr_lease_time expires while the destination server is
>         still reading the source file, the destination server is allowed
>         to finish reading the file.  If the cnr_lease_time expires
>         before the destination server uses READ or READ_PLUS to begin
>         the transfer, the source server can use NFS4ERR_PARTNER_NO_AUTH
>         to inform the destination server that the cnr_lease_time has
>         expired."
>
> The spec doesn't really define what "is allowed to finish reading the
> file" means, but I think the source server should decide somehow whether
> the target's done.  And "hasn't sent a read in cnr_lease_time" seems
> like a pretty good conservative definition that would be easy to
> enforce.

"hasn't send a read in cnr_lease_time" is already enforced.

The problem is when the copy did start in normal time, it might take
unknown time to complete. If we limit copies to all be done with in a
cnr_lease_time or even some number of that, we'll get into problems
when files are large enough or network is slow enough that it will
make this method unusable.

> Worst case, if the network goes down for a couple minutes and
> the target tries to pick up a copy where it left off, it'll get
> PARTNER_NO_AUTH.  I assume that results in the same error being returned
> the client, at which point the client knows that the copy_notify stateid
> may have installed and can do what it chooses to recover (like send a
> new copy_notify).

Yes the client recovers but the cost of setting up the source server
to destination is huge so any retries would kill the performance.

>
> The FREE_STATEID might also be a good idea, but I guess we can't count
> on it.
>
> Maybe the spec could use some errata to clarify that FREE_STATEID is
> allowed on copy_notify stateids, that clients should send it when
> they're done, and that servers are allowed to expire copy_notify
> stateid's even after their first use.

FREE_STATEID is for a stateid which a copy_notify (or copy) stateid is
so I don't see anything that really needs any extra stating. I think
what's needed is specifying that for COPY_NOTIFY a client must do a
FREE_STATEID when its done with a stateid.

>
> --b.