Re: [RFC] After server stop nfslock service, client still can get lock success

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

Chuck Lever:
> 
> On Nov 18, 2009, at 4:50 AM, Mi Jinlong wrote:
> 
>> Hi
>>
>> Chuck Lever:
>>>
>>> On Nov 17, 2009, at 4:47 AM, Mi Jinlong wrote:
>>>
>>>> When testing NLM, i find a bug.
>>>> After server stop nfslock service, client still can get lock success
>>>>
>>>> Test process:
>>>>
>>>> Step1: client open nfs file.
>>>> Step2: client using fcntl to get lock.
>>>> Step3: client using fcntl to release lock.
>>>> Step4: service stop it's nfslock service.
>>>> Step5: client using fcntl to get lock again.
>>>>
>>>> At step5, client should get lock fail, but it's success.
>>>>
>>>> Reason:
>>>> When server stop nfslock service, client's host struct not be
>>>> unmonitor at server. When client get lock again, the client's
>>>> host struct will be reuse but don't monitor again.
>>>> So that, at step5 client can get lock success.
>>>
>>> Effectively, the client is still monitored, since it is still in statd's
>>> monitored list.  Shutting down statd does not remove it from the monitor
>>> list.  If the local host reboots, sm-notify will still send the remote
>>> an SM_NOTIFY request, which is correct.
>>>
>>> Additionally, new clients attempting to lock files when statd is down
>>> will fail, which is correct if statd is not available.
>>>
>>> Conversely, if a monitored remote reboots, there is no way to notify the
>>> local lockd of the reboot, since statd normally relays the SM_NOTIFY to
>>> lockd, but isn't running.  That might be a problem.
>>
>>  Yes, it seems a problem.
>>
>>  I don't confirm it, so i want get your opinion.
> 
> Currently, there isn't a high degree of coordination between lockd and
> statd.  This is to maintain good scalability when serving NFS lock
> requests.  You offered a couple of alternatives for improving this
> specific situation, but my opinion is that there are larger, more
> general coordination issues here, and that what you observed is expected
> behavior for the current design.
> 
> This still seems to me like a case of "Patient: Doctor, it hurts when I
> do that." "Doctor: Well, then, don't do that."  In other words, we
> assume that "service nfslock stop" won't be used under normal operating
> conditions, and we know that NLM will misbehave if you stop statd during
> normal operation.
> 
>>> However, shutting down statd during normal operation is not a normal or
>>> supported thing to do.
>>>
>>>> Question:
>>>> 1. Should unmonitor the client's host struct at server
>>>>    when server stop nfslock service ?
>>>>
>>>> 2. Whether let rpc.statd tell kernel it's status(when start and stop)
>>>>    by send a SM_NOTIFY ?
>>>
>>> There are a number of other coordination issues around statd start-up
>>> and shut down.  The server's grace period, for instance, is not
>>> synchronized with sending reboot notifications.  So, we do recognize
>>> this is a general problem.
>>>
>>> In this case, however, I would expect indeterminate behavior if statd is
>>> shut down during normal operation, and that's exactly what we get.  I'm
>>> not sure it's even reasonable to support this use case.  Why would
>>> someone shut down statd and expect reliable NFSv2/v3 locking behavior?
>>> In other words, with due respect, what problem would we solve by fixing
>>> this, other than making your test case work?
>>
>>  When server's nfslock service is stop, client can get lock success
>> sometimes
>>  and can't get success sometimes, it's puzzled.
> 
> On Linux, the user space "nfslock" service is actually nothing more than
> statd.  Linux's NLM service is handled in the kernel, and is started and
> stopped when either a) there are NFS mounts, or b) NFSD is started.  The
> kernel's NLM service has nothing to do with "service nfslock start" any
> more.  I think there used to be a user space NLM implementation.
> 
>>> Out of curiosity, what happens if you try this on a Solaris server?
>>
>>  I'm a new man for Solaris.
>>  When Solaris's nlockmgr is stop, client can't get lock immediately.
> 
> I should have been more clear: if you stop Solaris' user space NSM
> daemon, can you lock files consistently?  My bet is that Solaris will
> demonstrate a similar degree of inconsistent behavior if you try
> NFSv2/v3 locking while starting and stopping its NSM service daemon.

  ^_^ 

  You are right, when i stop Solaris's NSM, client still can get lock success.
  Maybe it's the same as Linux.

-- 
Regards
Mi Jinlong

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux