Herta Van den Eynde wrote:
Lon Hohberger wrote:
On Tue, 2005-10-11 at 17:48 +0200, Herta Van den Eynde wrote:
Bit of extra information: the system that was running the services
got STONITHed by the other cluster member shortly before midnight.
The services all failed over nicely, but the situation remains: if I
try to stop or relocate a service, I get a "device is busy".
I suppose that rules out an intermittent issue.
There's no mounts below mounts.
Drat.
Nfsd is the most likely candidate for holding the reference.
Unfortunately, this is not something I can track down; you will have to
either file a support request and/or a Bugzilla. When you get a chance,
you should definitely try stopping nfsd and seeing if that clears the
mystery references (allowing you to unmount). If the problem comes from
nfsd, it should not be terribly difficult to track down.
Also, you should not need to recompile your kernel to probe all the LUNs
per device; just edit /etc/modules.conf:
options scsi_mod max_scsi_luns=128
... then run mkinitrd to rebuild the initrd image.
-- Lon
Next maintenance window is 4 weeks away, so I won't be able to test the
nfsd hypothesis anytime soon. In the meantime, I'll file a support
request. I'll keep you posted.
At least the unexpected STONITH confirms that the failover still works.
The /etc/modules.conf tip is a big time saver. Rebuilding the modules
takes forever.
Thanks, Lon.
Herta
Apologies for not updating this sooner. (Thanks for remindeing me, Owen.)
During a later maintenance window, I shut down the cluster services, but
it wasn't until I stopped the nfsd, that the filesystems could actually
be unmounted, which seems to confirm Lon's theory about nfsd being the
likely candidate for holding the reference.
I found a note elsewhere on the web where someone worked around the
problem by stopping nfsd, stopping the service, restarting nfsd, and
relocating the service. Disadvantage being that all nfs services
experience a minor interrupt at the time.
Anyway, my problem disappeared during the latest maintenance window.
Both nfs-utils and clumanager were updated (nfs-utils-1.0.6-42EL ->
nfs-utils-1.0.6-43EL, clumanager-1.2.28-1 -> clumanager-1.2.31-1), so
I'm not 100% sure which of the two fixed it, and curious though I am, I
simply don't have the time to start reading the code. If anyone has
further insights, I'd love to read about it, though.
Kind regards,
Herta
Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
--
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster