Re: [RFC] After server stop nfslock service, client still can get lock success

Mi Jinlong <mijinlong@xxxxxxxxxxxxxx> · Wed, 18 Nov 2009 17:50:30 +0800

Hi

Chuck Lever:
> 
> On Nov 17, 2009, at 4:47 AM, Mi Jinlong wrote:
> 
>> When testing NLM, i find a bug.
>> After server stop nfslock service, client still can get lock success
>>
>> Test process:
>>
>>  Step1: client open nfs file.
>>  Step2: client using fcntl to get lock.
>>  Step3: client using fcntl to release lock.
>>  Step4: service stop it's nfslock service.
>>  Step5: client using fcntl to get lock again.
>>
>> At step5, client should get lock fail, but it's success.
>>
>> Reason:
>>  When server stop nfslock service, client's host struct not be
>>  unmonitor at server. When client get lock again, the client's
>>  host struct will be reuse but don't monitor again.
>>  So that, at step5 client can get lock success.
> 
> Effectively, the client is still monitored, since it is still in statd's
> monitored list.  Shutting down statd does not remove it from the monitor
> list.  If the local host reboots, sm-notify will still send the remote
> an SM_NOTIFY request, which is correct.
> 
> Additionally, new clients attempting to lock files when statd is down
> will fail, which is correct if statd is not available.
> 
> Conversely, if a monitored remote reboots, there is no way to notify the
> local lockd of the reboot, since statd normally relays the SM_NOTIFY to
> lockd, but isn't running.  That might be a problem.

  Yes, it seems a problem.

  I don't confirm it, so i want get your opinion.

> 
> However, shutting down statd during normal operation is not a normal or
> supported thing to do.
> 
>> Question:
>>  1. Should unmonitor the client's host struct at server
>>     when server stop nfslock service ?
>>
>>  2. Whether let rpc.statd tell kernel it's status(when start and stop)
>>     by send a SM_NOTIFY ?
> 
> There are a number of other coordination issues around statd start-up
> and shut down.  The server's grace period, for instance, is not
> synchronized with sending reboot notifications.  So, we do recognize
> this is a general problem.
> 
> In this case, however, I would expect indeterminate behavior if statd is
> shut down during normal operation, and that's exactly what we get.  I'm
> not sure it's even reasonable to support this use case.  Why would
> someone shut down statd and expect reliable NFSv2/v3 locking behavior? 
> In other words, with due respect, what problem would we solve by fixing
> this, other than making your test case work?

  When server's nfslock service is stop, client can get lock success sometimes
  and can't get success sometimes, it's puzzled.

> 
> Out of curiosity, what happens if you try this on a Solaris server?

  I'm a new man for Solaris.
  When Solaris's nlockmgr is stop, client can't get lock immediately. 

thanks,
Mi Jinlong

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html