Re: [RFC 4/5] NFSD: Defer copying

"Myklebust, Trond" <Trond.Myklebust@xxxxxxxxxx> · Mon, 5 Aug 2013 14:50:38 +0000

On Mon, 2013-08-05 at 10:41 -0400, J. Bruce Fields wrote:
> On Mon, Aug 05, 2013 at 09:38:20AM +0100, Ric Wheeler wrote:
> > On 07/22/2013 08:55 PM, J. Bruce Fields wrote:
> > >On Mon, Jul 22, 2013 at 03:54:00PM -0400, Bryan Schumaker wrote:
> > >>On 07/22/2013 03:43 PM, J. Bruce Fields wrote:
> > >>>On Mon, Jul 22, 2013 at 03:37:00PM -0400, Bryan Schumaker wrote:
> > >>>>On 07/22/2013 03:30 PM, J. Bruce Fields wrote:
> > >>>>>On Mon, Jul 22, 2013 at 03:17:29PM -0400, Bryan Schumaker wrote:
> > >>>>>>On 07/22/2013 02:50 PM, J. Bruce Fields wrote:
> > >>>>>>>On Fri, Jul 19, 2013 at 05:03:49PM -0400, bjschuma@xxxxxxxxxx wrote:
> > >>>>>>>>From: Bryan Schumaker <bjschuma@xxxxxxxxxx>
> > >>>>>>>>
> > >>>>>>>>Rather than performing the copy right away, schedule it to run later and
> > >>>>>>>>reply to the client.  Later, send a callback to notify the client that
> > >>>>>>>>the copy has finished.
> > >>>>>>>I believe you need to implement the referring triple support described
> > >>>>>>>in http://tools.ietf.org/html/rfc5661#section-2.10.6.3 to fix the race
> > >>>>>>>described in
> > >>>>>>>http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-15.1.3
> > >>>>>>>.
> > >>>>>>I'll re-read and re-write.
> > >>>>>>
> > >>>>>>>I see cb_delay initialized below, but not otherwise used.  Am I missing
> > >>>>>>>anything?
> > >>>>>>Whoops!  I was using that earlier to try to fake up a callback, but I eventually decided it's easier to just do the copy asynchronously.  I must have forgotten to take it out :(
> > >>>>>>
> > >>>>>>>What about OFFLOAD_STATUS and OFFLOAD_ABORT?
> > >>>>>>I haven't thought out those too much... I haven't thought about a use for them on the client yet.
> > >>>>>If it might be a long-running copy, I assume the client needs the
> > >>>>>ability to abort if the caller is killed.
> > >>>>>
> > >>>>>(Dumb question: what happens on the network partition?  Does the server
> > >>>>>abort the copy when it expires the client state?)
> > >>>>>
> > >>>>>In any case,
> > >>>>>http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-15.1.3
> > >>>>>says "If a server's COPY operation returns a stateid, then the server
> > >>>>>MUST also support these operations: CB_OFFLOAD, OFFLOAD_ABORT, and
> > >>>>>OFFLOAD_STATUS."
> > >>>>>
> > >>>>>So even if we've no use for them on the client then we still need to
> > >>>>>implement them (and probably just write a basic pynfs test).  Either
> > >>>>>that or update the spec.
> > >>>>Fair enough.  I'll think it out and do something!  Easy solution: save this patch for later and only support the sync version of copy for the final version of this patch series.
> > >>>I can't remember--does the spec give the server a clear way to bail out
> > >>>and tell the client to fall back on a normal copy in cases where the
> > >>>server knows the copy could take an unreasonable amount of time?
> > >>>
> > >>>--b.
> > >>I don't think so.  Is there ever a case where copying over the network would be slower than copying on the server?
> > >Mybe not, but if the copy will take a minute, then we don't want to tie
> > >up an rpc slot for a minute.
> > >
> > >--b.
> > 
> > I think that we need to be able to handle copies that would take a
> > lot longer than just a minute - this offload could take a very long
> > time I assume depending on the size of the data getting copied and
> > the back end storage device....
> 
> Bryan suggested in offline discussion that one possibility might be to
> copy, say, at most a gigabyte at a time before returning and making the
> client continue the copy.
> 
> Where for "a gigabyte" read, "some amount that doesn't take too long to
> copy but is still enough to allow close to full bandwidth".  Hopefully
> that's an easy number to find.
> 
> But based on
> http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-14.1.2
> the COPY operation isn't designed for that--it doesn't give the option
> of returning bytes_copied in the successful case.

The reason is that the spec writers did not want to force the server to
copy the data in sequential order (or any other particular order for
that matter).

If the copy was short, then the client can't know which bytes were
copied; they could be at the beginning of the file, in the middle, or
even the very end. Basically, it needs to redo the entire copy in order
to be certain.

> Maybe we should fix that in the spec, or maybe we just need to implement
> the asynchronous case.  I guess it depends on which is easier,
> 
> 	a) implementing the asynchronous case (and the referring-triple
> 	   support to fix the COPY/callback races), or
> 	b) implementing this sort of "short copy" loop in a way that gives
> 	   good performance.
> 
> On the client side it's clearly a) since you're forced to handle that
> case anyway.  (Unless we argue that *all* copies should work that way,
> and that the spec should ditch the asynchronous case.) On the server
> side, b) looks easier.
> 
> --b.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@xxxxxxxxxx
www.netapp.com
��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥