Re: [RFC] server's statd and lockd will not sync after its nfslock restart

Chuck Lever <chuck.lever@xxxxxxxxxx> · Wed, 16 Dec 2009 14:33:20 -0500

On Dec 16, 2009, at 5:27 AM, Mi Jinlong wrote:
Chuck Lever:
On Dec 15, 2009, at 5:02 AM, Mi Jinlong wrote:
Hi,

When testing the NLM at the latest kernel(2.6.32),  i find a bug.
When a client hold locks, after server restart its nfslock service,
server's statd will not synchronize with lockd.
If server restart nfslock twice or more, client's lock will be lost.

Test process:

Step1: client open nfs file.
Step2: client using fcntl to get lock.
Step3: server restart it's nfslock service.

I'll assume here that you mean the equivalent of "service nfslock
restart".  This restarts statd and possibly runs sm-notify, but it  
has
no effect on lockd.

 Yes, i used "service nfslock restart".

 It has effect on lockd too, when service stop, lockd will get a  
KILL signal.
 Lockd will release all client's locks, and go into grace_period and  
wait
 client reclaime it's lock.

Again, this test seems artificial to me.  Is there a real world use  
case
where someone would deliberately restart statd while an NFS server is
serving files?  I pose this question because I've worked on statd  
only
for a year or so, and I am quite likely ignorant of all the ways it  
can
be deployed.

 ^/^, but maybe someone will restart nfslock when an NFS server is  
serving files.
 It is inevitable.

After step3, server's lockd records client holding locks, but  
statd's
/var/lib/nfs/statd/sm/ directory is empty. It means statd and  
lockd are
not sync. If server restart it's nfslock again, client's locks  
will be
lost.

The Primary Reason:

At step3, when client's reclaimed lock request is sent to server,
client's host(the host struct) is reused but not be re-monitored at
server's lockd. After that, statd and lockd are not sync.

The kernel squashes SM_MON upcalls for hosts that it already believes
are monitored.  This is a scalability feature.

 When statd start, it will move files from /var/lib/nfs/statd/sm/ to
 /var/lib/nfs/statd/sm.bak/.

Well, it's really sm-notify that does this.  sm-notify is run by  
rpc.statd when it starts up.

However, sm-notify should only retire the monitor list the first time  
it is run after a reboot.  Simply restarting statd should not change  
the on-disk monitor list in the slightest.  If it does, there's some  
kind of problem with the way sm-notify's pid file is managed, or  
perhaps with the nfslock script.

If lockd don't send a SM_MON to statd,
 statd will not monitor those client which be monitored before statd  
restart.

Question:

In my opinion, if lockd is allowed reuseing the client's host, it  
should
send a SM_MON to statd when reuse. If not allowed, the client's host
should
be destroyed immediately.

What should lockd to do?  Reuse ? Destroy ? Or some other action?

I don't immediately see why lockd should change it's behavior.   
Perhaps
statd/sm-notify were incorrect to delete the monitor list when you
restarted the nfslock service?

 Sorry, maybe i did not express clearly.
 I mean, lockd reuse the host struct which was created before statd  
restart.

 It seems have deleted the monitor list when nfslock restart.

lockd does not touch any user space files; the on-disk monitor list is  
managed by statd and sm-notify.  A remote peer rebooting does not  
clear the "monitored" flag for that peer in the local kernel's lockd,  
so it won't send another SM_MON request.

Now, it may be the case that "service nfslock start" uses a command  
line option that forces a fresh sm-notify run, and that is what is  
wiping the on-disk monitor list.  That would be the bug in this case  
-- sm-notify can and should be allowed to make its own determination  
of whether the monitor list gets retired.  Notification should not  
normally be forced by command line options in the nfslock script.

Can you show exactly how statd's state (ie it's on-disk monitor  
list in
/var/lib/nfs/statd/sm) changed across the restart?  Did sm-notify run
when you restarted statd?  If so, why didn't the sm-notify pid file  
stop
it?

 The statd and lockd's state at server when nfslock restart:

       lockd                   statd         |
                                             |
     host(monitored = 1)      /sm/client     |  client get locks  
success at first
         (locks)                             |
                                             |
     host(monitored = 1)      /sm/client     |  nfslock stop (lockd  
release client's locks)
         (no locks)                          |
                                             |
     host(monitored = 1)      /sm/           |  nfslock start  
(client reclaim locks)
         (locks)                             |                (but  
statd don't monitor it)

 note: host(monitored=1)  means: client's host struct is created,  
and is marked be monitored.
       (locks), (no locks)means: host strcut holds locks, or not.
       /sm/client         means: there have a file under /var/lib/ 
nfs/statd/sm directory
       /sm/               means: /var/lib/nfs/statd/sm is empty!

thanks,
Mi Jinlong

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html