Hi Chuck Lever: > > On Nov 18, 2009, at 4:50 AM, Mi Jinlong wrote: > >> Hi >> >> Chuck Lever: >>> >>> On Nov 17, 2009, at 4:47 AM, Mi Jinlong wrote: >>> >>>> When testing NLM, i find a bug. >>>> After server stop nfslock service, client still can get lock success >>>> >>>> Test process: >>>> >>>> Step1: client open nfs file. >>>> Step2: client using fcntl to get lock. >>>> Step3: client using fcntl to release lock. >>>> Step4: service stop it's nfslock service. >>>> Step5: client using fcntl to get lock again. >>>> >>>> At step5, client should get lock fail, but it's success. >>>> >>>> Reason: >>>> When server stop nfslock service, client's host struct not be >>>> unmonitor at server. When client get lock again, the client's >>>> host struct will be reuse but don't monitor again. >>>> So that, at step5 client can get lock success. >>> >>> Effectively, the client is still monitored, since it is still in statd's >>> monitored list. Shutting down statd does not remove it from the monitor >>> list. If the local host reboots, sm-notify will still send the remote >>> an SM_NOTIFY request, which is correct. >>> >>> Additionally, new clients attempting to lock files when statd is down >>> will fail, which is correct if statd is not available. >>> >>> Conversely, if a monitored remote reboots, there is no way to notify the >>> local lockd of the reboot, since statd normally relays the SM_NOTIFY to >>> lockd, but isn't running. That might be a problem. >> >> Yes, it seems a problem. >> >> I don't confirm it, so i want get your opinion. > > Currently, there isn't a high degree of coordination between lockd and > statd. This is to maintain good scalability when serving NFS lock > requests. You offered a couple of alternatives for improving this > specific situation, but my opinion is that there are larger, more > general coordination issues here, and that what you observed is expected > behavior for the current design. > > This still seems to me like a case of "Patient: Doctor, it hurts when I > do that." "Doctor: Well, then, don't do that." In other words, we > assume that "service nfslock stop" won't be used under normal operating > conditions, and we know that NLM will misbehave if you stop statd during > normal operation. > >>> However, shutting down statd during normal operation is not a normal or >>> supported thing to do. >>> >>>> Question: >>>> 1. Should unmonitor the client's host struct at server >>>> when server stop nfslock service ? >>>> >>>> 2. Whether let rpc.statd tell kernel it's status(when start and stop) >>>> by send a SM_NOTIFY ? >>> >>> There are a number of other coordination issues around statd start-up >>> and shut down. The server's grace period, for instance, is not >>> synchronized with sending reboot notifications. So, we do recognize >>> this is a general problem. >>> >>> In this case, however, I would expect indeterminate behavior if statd is >>> shut down during normal operation, and that's exactly what we get. I'm >>> not sure it's even reasonable to support this use case. Why would >>> someone shut down statd and expect reliable NFSv2/v3 locking behavior? >>> In other words, with due respect, what problem would we solve by fixing >>> this, other than making your test case work? >> >> When server's nfslock service is stop, client can get lock success >> sometimes >> and can't get success sometimes, it's puzzled. > > On Linux, the user space "nfslock" service is actually nothing more than > statd. Linux's NLM service is handled in the kernel, and is started and > stopped when either a) there are NFS mounts, or b) NFSD is started. The > kernel's NLM service has nothing to do with "service nfslock start" any > more. I think there used to be a user space NLM implementation. > >>> Out of curiosity, what happens if you try this on a Solaris server? >> >> I'm a new man for Solaris. >> When Solaris's nlockmgr is stop, client can't get lock immediately. > > I should have been more clear: if you stop Solaris' user space NSM > daemon, can you lock files consistently? My bet is that Solaris will > demonstrate a similar degree of inconsistent behavior if you try > NFSv2/v3 locking while starting and stopping its NSM service daemon. ^_^ You are right, when i stop Solaris's NSM, client still can get lock success. Maybe it's the same as Linux. -- Regards Mi Jinlong -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html