On Mon, 2018-06-25 at 11:39 -0400, Chuck Lever wrote: > > On Jun 24, 2018, at 9:56 AM, Trond Myklebust <trondmy@hammerspace.c > > om> wrote: > > > > On Sat, 2018-06-23 at 15:00 -0400, Chuck Lever wrote: > > > > On Jun 22, 2018, at 6:31 PM, Trond Myklebust <trondmy@hammerspa > > > > ce.c > > > > om> wrote: > > > > > > > > On Fri, 2018-06-22 at 17:49 -0400, Chuck Lever wrote: > > > > > Hi Bruce- > > > > > > > > > > > > > > > > On Jun 22, 2018, at 1:54 PM, J. Bruce Fields <bfields@field > > > > > > ses. > > > > > > org> > > > > > > wrote: > > > > > > > > > > > > On Thu, Jun 21, 2018 at 04:35:33PM +0000, Manjunath Patil > > > > > > wrote: > > > > > > > Presently nfserr_jukebox is being returned by nfsd for > > > > > > > create_session > > > > > > > request if server is unable to allocate a session slot. > > > > > > > This > > > > > > > may > > > > > > > be > > > > > > > treated as NFS4ERR_DELAY by the clients and which may > > > > > > > continue to > > > > > > > re-try > > > > > > > create_session in loop leading NFSv4.1+ mounts in hung > > > > > > > state. > > > > > > > nfsd > > > > > > > should return nfserr_nospc in this case as per > > > > > > > rfc5661(section- > > > > > > > 18.36.4 > > > > > > > subpoint 4. Session creation). > > > > > > > > > > > > I don't think the spec actually gives us an error that we > > > > > > can > > > > > > use > > > > > > to say > > > > > > a CREATE_SESSION failed permanently for lack of resources. > > > > > > > > > > The current situation is that the server replies > > > > > NFS4ERR_DELAY, > > > > > and the client retries indefinitely. The goal is to let the > > > > > client choose whether it wants to try the CREATE_SESSION > > > > > again, > > > > > try a different NFS version, or fail the mount request. > > > > > > > > > > Bill and I both looked at this section of RFC 5661. It seems > > > > > to > > > > > us that the use of NFS4ERR_NOSPC is appropriate and > > > > > unambiguous > > > > > in this situation, and it is an allowed status for the > > > > > CREATE_SESSION operation. NFS4ERR_DELAY OTOH is not helpful. > > > > > > > > There are a range of errors which we may need to handle by > > > > destroying > > > > the session, and then creating a new one (mainly the ones where > > > > the > > > > client and server slot handling get out of sync). That's why > > > > returning > > > > NFS4ERR_NOSPC in response to CREATE_SESSION is unhelpful, and > > > > is > > > > why > > > > the only sane response by the client will be to treat it as a > > > > temporary > > > > error. > > > > IOW: these patches will not be acceptable, even with a rewrite, > > > > as > > > > they > > > > are based on a flawed assumption. > > > > > > Fair enough. We're not attached to any particular solution/fix. > > > > > > So let's take "recovery of an active mount" out of the picture > > > for a moment. > > > > > > The narrow problem is behavioral: during initial contact with an > > > unfamiliar server, the server can hold off a client indefinitely > > > by sending NFS4ERR_DELAY for example until another client > > > unmounts. > > > We want to find a way to allow clients to make progress when a > > > server is short of resources. > > > > > > It appears that the mount(2) system call does not return as long > > > as the server is still returning NFS4ERR_DELAY. Possibly user > > > space is never given an opportunity to stop retrying, and thus > > > mount.nfs gets stuck. > > > > > > It appears that DELAY is OK for EXCHANGE_ID too. So if a server > > > decides to return DELAY to EXCHANGE_ID, I wonder if our client's > > > trunking detection would be hamstrung by one bad server... > > > > The 'mount' program has the 'retry' option in order to set a > > timeout > > for the mount operation itself. Is that option not working > > correctly? > > Manjunath will need to confirm that, but my understanding is that > mount.nfs is not regaining control when the server returns DELAY > to CREATE_SESSION. My conclusion was that mount(2) is not returning. > > > > If so, we should definitely fix that. > > My recollection is that mount.nfs polls, it does not set a timer > signal. So it will call mount(2) repeatedly until either "retry" > minutes has passed, or mount(2) succeeds. I don't think it will > deal with mount(2) not returning, but I could be wrong about that. > > My preference would be to make the kernel more reliable (ie mount(2) > fails immediately in this case). That gives mount.nfs some time to > try other things (like, try the original mount again after a few > moments, or fall back to NFSv4.0, or fail). Falling back to NFSv4.0 is also wrong in this case. 4.0 relies on the DRC for replay protection against all non-stateful nonidempotent operations (i.e. mkdir(), unlink(), rename(), ...). If you want to make ENOSPC a fatal error, then that means you need to educate users about the remedies, and I can't see that we're agreeing on what constitutes the right remedy here. So i disagree that it is OK to expose this particular error to userland for now. I'm OK with fixing'retry=', but that's because it is a well defined control mechanism. > We don't want mount.nfs to wait for the full retry= while doing > nothing else. That would make this particular failure mode behave > differently than all the other modes we have had, historically, IIUC. > > Also, I agree with Bruce that the server should make CREATE_SESSION > less likely to fail. That would also benefit state recovery. > > > > We might also want to look into making it take values < 1 minute. > > That > > could be accomplished either by extending the syntax of the 'retry' > > option (e.g.: 'retry=<minutes>:<seconds>') or by adding a new > > option > > (e.g. 'sretry=<seconds>'). > > > > It would then be up to the caller of mount to decide the policy of > > what > > to do after a timeout. > > I agree that the caller of mount(2) should be allowed to provide the > policy. > > > > Renegotiation downward to NFSv3 might be an > > option, but it's not something that most people want to do in the > > case > > where there are lots of clients competing for resources since > > that's > > precisely the regime where the NFSv3 DRC scheme breaks down (lots > > of > > disconnections, combined with a high turnover of DRC slots). > > -- > Chuck Lever > chucklever@xxxxxxxxx > > > -- Trond Myklebust CTO, Hammerspace Inc 4300 El Camino Real, Suite 105 Los Altos, CA 94022 www.hammer.space ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥