Re: [PATCH 1/9] Revert "pnfs-submit: wave2: remove forgotten layoutreturn struct definitions"

Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> · Thu, 16 Dec 2010 13:14:41 -0500

On Thu, 2010-12-16 at 19:42 +0200, Benny Halevy wrote:
> On 2010-12-16 19:35, Trond Myklebust wrote:
> > On Thu, 2010-12-16 at 18:24 +0200, Benny Halevy wrote:
> >> On 2010-12-16 17:55, Trond Myklebust wrote:
> >>> OK, so why not just go the whole hog and do that for all rare cases,
> >>> including the one where the server recalls a layout segment that we
> >>> happen to be doing I/O to?
> >>>
> >>> The case we should be optimising for is the one where the layout is
> >>> recalled, and no I/O to that segment is in progress. For that case,
> >>> returning OK, then doing the LAYOUTRETURN instead of just returning
> >>> NOMATCHING_LAYOUT is clearly wrong: it adds a completely unnecessary
> >>> round trip to the server. Agreed?
> >>
> >> I agree that if the client can free the recalled layout synchronously
> >> and if it need not send a LAYOUTCOMMIT or LAYOUTRETURN (e.g. in the objects case)
> >> it can simply return NFS4ERR_NOMATCHING_LAYOUT.
> > 
> > Objects and blocks != wave 2. We can cross that bridge when we get to
> > it.
> > 
> 
> Right.  This patchset is destined as post wave2.

In that case it has a very confusing title (which certainly caught me by
surprise).

> 
> >>>
> >>> As for the much rarer case of a recall of a layout that is in use, how
> >>> does LAYOUTRETURN speed things up? As far as I can see, the MDS is still
> >>> going to return NFS4ERR_DELAY to the client that requested the
> >>> conflicting LAYOUTGET. That client then has to resend this LAYOUTGET
> >>> request, at a time when the first client may or may not have returned
> >>> its layout segment. So how is LAYOUTRETURN going to make all this a fast
> >>> and scalable process?
> >>>
> >>
> >> First, the server does not have to poll the client and waste cpu and network
> >> resources on that.
> > 
> > ...but this is a ____rare____ case. If you are seeing noticeable effects
> > on the network from this, then something is wrong. If that is ever the
> > case, then you should be writing through the MDS anyway.
> > 
> > Furthermore, the MDS does need to be able to cope with NFS4ERR_DELAY
> > anyway, so why add the extra complexity to the client?
> > 
> >> Second, for the competing client, with notifications, it too does not have
> >> to poll the server and can wait on getting the notification when the
> >> layout becomes available.
> > 
> > There is no notification of layout availability in RFC5661. Lock
> > notification is for byte range locks, and device id notification is for
> > device ids. The rest is for directory notifications.
> > 
> 
> Hmm, CB_RECALLABLE_OBJ_AVAIL in response to loga_signal_layout_avail...

Hmm indeed. Section 12.3 states:

"CB_RECALLABLE_OBJ_AVAIL  (Section 20.7) tells a client that a
recallable object that it was denied (in case of pNFS, a layout denied
by LAYOUTGET) due to resource exhaustion is now available."

and 18.43.3 states:

"If client sets loga_signal_layout_avail to TRUE, then it is registering
with the client a "want" for a layout in the event the layout cannot be
obtained due to resource exhaustion."

I can't see how that is relevant to the case where a specific LAYOUTGET
requires a layout recall from another client. That's not resource
exhaustion.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@xxxxxxxxxx
www.netapp.com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html