Re: [nfsv4] RFC 5661 write to DS clarification. (was: [nfsv4] RFC 5661 LAYOUTRETURN clarification)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



BTW: 

I forgot to add that the fencing issues are also the reason why the
Linux client is unlikely to comply any time soon with RFC5661 Section
13.9.1.'s request that we prefer use of the OPEN stateid over the LOCK
stateid when talking to the DS.
If the server revokes the lock, or if the client calls LOCKU, all WRITEs
that were made under that lock need to be fenced off. Unless mandatory
locking is in effect, that won't happen if the WRITE ops were sent using
the OPEN stateid.

This is also why I believe we should revisit the rule that the client
should only send stateids with a zero seqid to the DS.

Cheers
  Trond

On Mon, 2012-06-11 at 15:43 -0400, Trond Myklebust wrote:
> The _only_ reason why a pNFS files client would ever want to send a
> LAYOUTRETURN is in order to have the MDS take action to fence off any
> outstanding writes to the DS.
> 
> The _only_ case where that is actually an important issue is when
> something happens to the DS which forces the client to fall back to
> writing through the MDS.
> 
> _ALL_ other cases are trivially covered by the existing NFSv4 state
> model in that when the client unlocks and/or closes the file, then the
> lock/open stateids that are used in the READ and WRITE operations will
> be updated, and will cause those operations to be rejected with a
> BAD_STATEID error. This fencing model is irrespective of whether or not
> a layout is held, and is irrespective of whether the READ/WRITE was sent
> to the MDS or the DS.
> 
> 
> IOW: if pNFS files servers don't want to do this kind of fencing, then I
> suggest we file an errata that labels the LAYOUTRETURN operation as
> mandatory to not implement for those servers.
> 
> On Mon, 2012-06-11 at 15:02 -0400, david.noveck@xxxxxxx wrote:
> > > And again, please explain why do you want it. What is wrong with the
> > > case we all agree with? ie: "Client can not call LAYOUTRETURN until
> > > all in-flight RPCs return, with or without an error"
> > 
> > It's a recipe for data corruption.  If, as Andy explained, he starts doing
> > IO's (let's suppose WRITEs) to the MDS any lingering WRITEs to the DS
> > since they reflect an earlier state of affairs can cause data corruption.
> > 
> > There are three ways to prevent those lingering DS writes from corrupting 
> > data:
> > 
> > 1) Doing a LAYOUTRETURN
> > 2) waiting until the IO's return.
> > 3) "magically plugging the network interface".
> > 
> > 
> > Since there is no way to do 3), saying that you only can do 1) until after
> > 2) is done is essentially going to mean:
> > 
> > a) that it may take a very long time:
> > b) that you will only do it, when it is no longer useful.
> > 
> > If you do 1) asap, then the lingering DS write problem is gone sooner,
> > and that's a good thing. 
> > 
> > -----Original Message-----
> > From: nfsv4-bounces@xxxxxxxx [mailto:nfsv4-bounces@xxxxxxxx] On Behalf Of Boaz Harrosh
> > Sent: Monday, June 11, 2012 2:41 PM
> > To: Andy Adamson
> > Cc: Andy Adamson; NFS list; Trond Myklebust; NFSv4
> > Subject: Re: [nfsv4] RFC 5661 LAYOUTRETURN clarification.
> > 
> > On 06/11/2012 07:01 PM, Andy Adamson wrote:
> > 
> > > I'm coding file layout data server recovery for the Linux NFS client,
> > > and came across an issue with LAYOUTRETURN that
> > > could use some comment from the list.
> > > 
> > > The error case I'm handling is an RPC layer dis-connection error
> > > during heavy WRITE i/o to a file layout data server. Our response is
> > > to internally mark the deviceid as invalid which prevents all pNFS
> > > calls using the deviceid - e.g. no new I/O using any layout that uses
> > > the invalid deviceid, and to redirect all I/O to the MDS (any queued
> > > RPC request that has not been sent is redirected to the MDS).
> > > 
> > > Plus - and here is where the clarification is needed - we immediately
> > > send a LAYOUTRETURN for any layout with in-flight requests to the
> > > dis-connected data server.  By in-flight I mean transmitted WRT the
> > > RPC layer.  The purpose of this LAYOUTRETURN is to notify the file
> > > layout MDS to fence the DS for the specified LAYOUTs, as the WRITEs
> > > will also be sent to the MDS.
> > > 
> > 
> > 
> > I do not disagree with this completely. The point here is very fine
> > grained and should be specified explicitly. I would like to see text
> > as of something like.
> > 
> > There are 3 types of in-flght RPC/IO
> > 1. Client has sent RPC header + all of associated data and is waiting
> >    for DS WRITE/READ_DONE reply.
> > 
> >    (For me this case can be, client may return LAYOUTRETURN as your
> >     suggestion)
> > 
> > 2. Client has sent the RPC header but has got stuck sending the rest
> >    of the RPC message. Then received a network disconnect. This is the
> >    most common part. Putting aside the RPC that got the error for a second.
> >    The most important is what to do with parallel RPC/IO which are in this
> >    state. Are parallel RPCs allowed to continue sending network packets
> >    after the LAYOUTRETURN was sent?
> > 
> >    The specific RPC that got stuck is not interesting because it's kind of
> >    1.5, We are not going to send any bytes on that channel. The interesting
> >    is these other DSs which are still streaming
> > 
> > 3. Client has some internal RPC queue which do to some client parallelism
> >    will start sending RPC header + data after the LAYOUTRETURN was sent
> >    
> > What my point was that with the code you submitted we are clearly violating
> > 2. and even 3. Because I do not see anything avoiding this.
> > 
> > And if the STD allows you 2 and 3. Then that's a big change to the concept.
> > Not like you let it seem.
> > 
> > > I contend that sending the LAYOUTRETURN in this error case does not
> > > violate the two sections of RFC 5661 below, as the client has stopped
> > > sending any I/O requests using the returned layout.
> > > 
> > 
> > 
> > I would not mind if this was true. That is if the LAYOUTRETURN was
> > a very clear barrier where our client would "magically" completely
> > plug the network interface and will not continue to send a single
> > byte on the wire to *any* DS involved with the layout. That's fine.
> > 
> > That is only allow sate 1 and 1.5 RPCs above. Some/all bytes where
> > presented on the wire, until the LAYOUTRETURN, from which point all
> > RPCs are hard aborted and not a single byte is sent.
> > 
> > 
> > > Others contend that since the in-flight RPCs reference the returned
> > > layout, the client is still 'using' the layout with these in-flight
> > > requests, and can not call LAYOUTRETURN until all in-flight RPCs
> > > return, with or without an error.
> > > 
> > 
> > 
> > With our client code I don't see how the guaranty of 2 and 3 above
> > will happen without actually implementing this here.
> > 
> > So in principal I agree with your principle, I only do not agree
> > with your practice. In your new code you are violating 2 and 3
> > which are not to be allowed.
> > 
> > And again, please explain why do you want it. What is wrong with the
> > case we all agree with? ie: "Client can not call LAYOUTRETURN until
> > all in-flight RPCs return, with or without an error"
> > 
> > Thanks
> > Boaz
> > 
> > > 
> > > Section 18.44.3 - the description section of the LAYOUTRETURN operation:
> > > 
> > >    After this call,
> > >    the client MUST NOT use the returned layout(s) and the associated
> > >    storage protocol to access the file data.
> > > 
> > > Section 13.6 Operations Sent to NFSv4.1 Data Servers
> > > 
> > >   As described in Section 12.5.1, a client
> > >   MUST NOT send an I/O to a data server for which it does not hold a
> > >   valid layout; the data server MUST reject such an I/O.
> > > 
> > > 
> > > -->Andy
> > 
> > 
> > _______________________________________________
> > nfsv4 mailing list
> > nfsv4@xxxxxxxx
> > https://www.ietf.org/mailman/listinfo/nfsv4
> > 
> 

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@xxxxxxxxxx
www.netapp.com

��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux