Re: [RFC] After nfs restart, locks can't be recovered which record by lockd before

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Jan 15, 2010, at 4:35 AM, Mi Jinlong wrote:
Hi Chuck,

Chuck Lever 写道:
On Jan 14, 2010, at 5:06 AM, Mi Jinlong wrote:
Hi Chuck,

Chuck Lever 写道:
On 01/13/2010 07:51 AM, Jeff Layton wrote:
On Wed, 13 Jan 2010 17:51:25 +0800
Mi Jinlong<mijinlong@xxxxxxxxxxxxxx>  wrote:

Assuming you're using a RH-derived distro like Fedora or RHEL, then no. statd is controlled by a separate init script (nfslock) and when you run "service nfs restart" you're not restarting it. NSM notifications
are not sent and clients generally won't reclaim their locks.

IOW, "you're doing it wrong". If you want locks to be reclaimed then
you probably need to restart the nfslock service too.

Mi Jinlong is exercising another case we know doesn't work right, but we don't expect admins will ever perform this kind of "down-up" on a normal production server. In other words, we expect it to work this way, and
it's been good enough, so far.

As Jeff points out, the "nfs" and the "nfslock" services are separate. This is because "nfslock" is required for both client and server side NFS, but "nfs" is required only on the server. This split also dictates
the way sm-notify works, since it has to behave differently on NFS
clients and servers.
Two other points:

+ lockd would not restart itself in this case if there happened to be
NFS mounts on that system

When testing, i find nfs restart will cause lockd restart.
I find some codes which cause the lock stop when nfs stop.

At kernel 2.6.18, fs/lockd/svc.c
...
354         if (nlmsvc_users) {
355                 if (--nlmsvc_users)
356                         goto out;
357         } else
358                 printk(KERN_WARNING "lockd_down: no users!
pid=%d\n", nlmsvc_pid);
...
366
367         kill_proc(nlmsvc_pid, SIGKILL, 1);
...

At kernel 2.6.18, fs/lockd/svc.c
...
344         if (nlmsvc_users) {
345                 if (--nlmsvc_users)
346                         goto out;
347         } else {
348 printk(KERN_ERR "lockd_down: no users! task=%p \n",
349                         nlmsvc_task);
350                 BUG();
351         }
....
357         kthread_stop(nlmsvc_task);
358         svc_exit_thread(nlmsvc_rqst);
...

As above, when nlmsvc_users <= 1, the lockd will be killed.


+ lockd doesn't currently poke statd when it restarts to tell it to
send reboot notifications, but it probably should

Yes, I agree with you. But now, when some reason cause lockd restart but
statd not restart, locks which hold before will lost.

Maybe, the kernel should fix this.

What did you have in mind?

I think when lockd restart, statd should restart too and sent sm- notify to other client.

Sending notifications is likely the correct thing to do if lockd is restarted while there are active locks. A statd restart isn't necessarily required to send reboot notifications, however. You can do it with "sm-notify -f".

The problem with "sm-notify -f" is that it deletes the on-disk monitor list while statd is still running. This means the on-disk monitor list and statd's in-memory monitor list will be out of sync. I seem to recall that sm-notify is run by itself by cluster scripts, and that could be a real problem.

As implemented on RH, "service nfslock restart" will restart statd and force an sm-notify anyway, so no real harm done, but that's pretty heavyweight (and requires that admins do "service nfs stop; service nfslock restart; service nfs start" or something like that if they want to get proper lock recovery).

A simple restart of statd (outside of the nfslock script) probably won't be adequate, though. It will respect the sm-notify pidfile, and not send notifications when started up. I don't see a flag on statd to force it to send notifications on restart (-N only sends notifications; it doesn't also start the statd daemon).

In a perfect world, when lockd restarts, it would send up an SM_SIMU_CRASH, and statd would do the right thing: if there are monitored peers, it would send reboot notifications, and adjust it's monitor list accordingly; if there were no monitored peers, it would do nothing. Thus no statd restart would be needed.

 But now, in kernel and nfs-uitls, it don't implemented.
As the communication style between lockd and statd, this is indeed not easy to implement it.

So, I think it's should more easy to implement it through the mechanism that exposes
 the kernel's nlm_host cache via /sys you show me before.

I want to know when using cammond "service nfslock restart" restart the
nfslock service(means restart statd and lockd), will the statd call
sm-notify
to notify other client? Or don't?

Currently "service nfslock restart" always causes a notification to be sent. Since "service nfslock restart" causes lockd to drop its locks (I assume that's what that "killproc lockd" does) I guess we need to force reboot notifications here. (I still argue that removing the pidfile in
the "start" case is not correct).

It appears that both the nfs and nfslock start up scripts do something
to lockd (as well as the case when the number of NFS mounts goes to
zero).  However, only the nfslock script forces sm-notify to send
notifications.

But, at RHLE5 and Fedora, when using cammond "service nfslock restart" restart
 the nfslock service, the lockd isn't shutdown and rmmod'd.

 Is it a bug?

For the "no more NFS mounts case" and the server shutdown case, the NFS client or server, both being in the kernel, call lockd_down enough times to make the user count go to zero. lockd.ko can be removed at that point. I seem to recall there being some kind of automatic mechanism for module removal after a period of zero module refcount. In other words, lockd.ko is removed as a side effect, afaict.

The nfslock script doesn't stop either the kernel client or server code, so it doesn't really cause a lockd_down call. But, nfslock does do a "killproc lockd". My assumption is that causes all locks to be dropped. So it's not a cold restart of lockd, but we still potentially lose a lot of lock state here.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux