Re: [RFC] After nfs restart, locks can't be recovered which record by lockd before

Chuck Lever <chuck.lever@xxxxxxxxxx> · Fri, 15 Jan 2010 11:12:14 -0500

On Jan 15, 2010, at 4:35 AM, Mi Jinlong wrote:
Hi Chuck,

Chuck Lever 写道:
On Jan 14, 2010, at 5:06 AM, Mi Jinlong wrote:
Hi Chuck,

Chuck Lever 写道:
On 01/13/2010 07:51 AM, Jeff Layton wrote:
On Wed, 13 Jan 2010 17:51:25 +0800
Mi Jinlong<mijinlong@xxxxxxxxxxxxxx>  wrote:

Assuming you're using a RH-derived distro like Fedora or RHEL,  
then no.
statd is controlled by a separate init script (nfslock) and when  
you
run "service nfs restart" you're not restarting it. NSM  
notifications
are not sent and clients generally won't reclaim their locks.

IOW, "you're doing it wrong". If you want locks to be reclaimed  
then
you probably need to restart the nfslock service too.

Mi Jinlong is exercising another case we know doesn't work right,  
but we
don't expect admins will ever perform this kind of "down-up" on a  
normal
production server.  In other words, we expect it to work this  
way, and
it's been good enough, so far.

As Jeff points out, the "nfs" and the "nfslock" services are  
separate.
This is because "nfslock" is required for both client and server  
side
NFS, but "nfs" is required only on the server.  This split also  
dictates
the way sm-notify works, since it has to behave differently on NFS
clients and servers.
Two other points:

+ lockd would not restart itself in this case if there happened  
to be
NFS mounts on that system

When testing, i find nfs restart will cause lockd restart.
I find some codes which cause the lock stop when nfs stop.

At kernel 2.6.18, fs/lockd/svc.c
...
354         if (nlmsvc_users) {
355                 if (--nlmsvc_users)
356                         goto out;
357         } else
358                 printk(KERN_WARNING "lockd_down: no users!
pid=%d\n", nlmsvc_pid);
...
366
367         kill_proc(nlmsvc_pid, SIGKILL, 1);
...

At kernel 2.6.18, fs/lockd/svc.c
...
344         if (nlmsvc_users) {
345                 if (--nlmsvc_users)
346                         goto out;
347         } else {
348                 printk(KERN_ERR "lockd_down: no users! task=%p 
\n",
349                         nlmsvc_task);
350                 BUG();
351         }
....
357         kthread_stop(nlmsvc_task);
358         svc_exit_thread(nlmsvc_rqst);
...

As above, when nlmsvc_users <= 1, the lockd will be killed.

+ lockd doesn't currently poke statd when it restarts to tell it to
send reboot notifications, but it probably should

Yes, I agree with you. But now, when some reason cause lockd  
restart but
statd not restart, locks which hold before will lost.

Maybe, the kernel should fix this.

What did you have in mind?

 I think when lockd restart, statd should restart too and sent sm- 
notify to other client.

Sending notifications is likely the correct thing to do if lockd is  
restarted while there are active locks.  A statd restart isn't  
necessarily required to send reboot notifications, however.  You can  
do it with "sm-notify -f".

The problem with "sm-notify -f" is that it deletes the on-disk monitor  
list while statd is still running.  This means the on-disk monitor  
list and statd's in-memory monitor list will be out of sync.  I seem  
to recall that sm-notify is run by itself by cluster scripts, and that  
could be a real problem.

As implemented on RH, "service nfslock restart" will restart statd and  
force an sm-notify anyway, so no real harm done, but that's pretty  
heavyweight (and requires that admins do "service nfs stop; service  
nfslock restart; service nfs start" or something like that if they  
want to get proper lock recovery).

A simple restart of statd (outside of the nfslock script) probably  
won't be adequate, though.  It will respect the sm-notify pidfile, and  
not send notifications when started up.  I don't see a flag on statd  
to force it to send notifications on restart (-N only sends  
notifications; it doesn't also start the statd daemon).

In a perfect world, when lockd restarts, it would send up an  
SM_SIMU_CRASH, and statd would do the right thing:  if there are  
monitored peers, it would send reboot notifications, and adjust it's  
monitor list accordingly;  if there were no monitored peers, it would  
do nothing.  Thus no statd restart would be needed.

 But now, in kernel and nfs-uitls, it don't implemented.
 As the communication style between lockd and statd, this is indeed  
not easy to implement it.

 So, I think it's should more easy to implement it through the  
mechanism that exposes
 the kernel's nlm_host cache via /sys you show me before.

I want to know when using cammond "service nfslock restart"  
restart the
nfslock service(means restart statd and lockd), will the statd call
sm-notify
to notify other client? Or don't?

Currently "service nfslock restart" always causes a notification to  
be
sent.  Since "service nfslock restart" causes lockd to drop its  
locks (I
assume that's what that "killproc lockd" does) I guess we need to  
force
reboot notifications here.  (I still argue that removing the  
pidfile in
the "start" case is not correct).

It appears that both the nfs and nfslock start up scripts do  
something
to lockd (as well as the case when the number of NFS mounts goes to
zero).  However, only the nfslock script forces sm-notify to send
notifications.

 But, at RHLE5 and Fedora, when using cammond "service nfslock  
restart" restart
 the nfslock service, the lockd isn't shutdown and rmmod'd.

 Is it a bug?

For the "no more NFS mounts case" and the server shutdown case, the  
NFS client or server, both being in the kernel, call lockd_down enough  
times to make the user count go to zero.  lockd.ko can be removed at  
that point.  I seem to recall there being some kind of automatic  
mechanism for module removal after a period of zero module refcount.   
In other words, lockd.ko is removed as a side effect, afaict.

The nfslock script doesn't stop either the kernel client or server  
code, so it doesn't really cause a lockd_down call.  But, nfslock does  
do a "killproc lockd".  My assumption is that causes all locks to be  
dropped.  So it's not a cold restart of lockd, but we still  
potentially lose a lot of lock state here.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html