Re: umount failed - device is busy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Herta Van den Eynde wrote:
Herta Van den Eynde wrote:

Lon Hohberger wrote:

On Tue, 2005-10-11 at 17:48 +0200, Herta Van den Eynde wrote:


Bit of extra information: the system that was running the services got STONITHed by the other cluster member shortly before midnight. The services all failed over nicely, but the situation remains: if I try to stop or relocate a service, I get a "device is busy".
I suppose that rules out an intermittent issue.

There's no mounts below mounts.




Drat.

Nfsd is the most likely candidate for holding the reference.
Unfortunately, this is not something I can track down; you will have to
either file a support request and/or a Bugzilla.  When you get a chance,
you should definitely try stopping nfsd and seeing if that clears the
mystery references (allowing you to unmount).  If the problem comes from
nfsd, it should not be terribly difficult to track down.

Also, you should not need to recompile your kernel to probe all the LUNs
per device; just edit /etc/modules.conf:

options scsi_mod max_scsi_luns=128

... then run mkinitrd to rebuild the initrd image.

-- Lon


Next maintenance window is 4 weeks away, so I won't be able to test the nfsd hypothesis anytime soon. In the meantime, I'll file a support request. I'll keep you posted.

At least the unexpected STONITH confirms that the failover still works.

The /etc/modules.conf tip is a big time saver. Rebuilding the modules takes forever.

Thanks, Lon.

Herta


Apologies for not updating this sooner.  (Thanks for remindeing me, Owen.)

During a later maintenance window, I shut down the cluster services, but it wasn't until I stopped the nfsd, that the filesystems could actually be unmounted, which seems to confirm Lon's theory about nfsd being the likely candidate for holding the reference.

I found a note elsewhere on the web where someone worked around the problem by stopping nfsd, stopping the service, restarting nfsd, and relocating the service. Disadvantage being that all nfs services experience a minor interrupt at the time.

Anyway, my problem disappeared during the latest maintenance window. Both nfs-utils and clumanager were updated (nfs-utils-1.0.6-42EL -> nfs-utils-1.0.6-43EL, clumanager-1.2.28-1 -> clumanager-1.2.31-1), so I'm not 100% sure which of the two fixed it, and curious though I am, I simply don't have the time to start reading the code. If anyone has further insights, I'd love to read about it, though.

Kind regards,

Herta

Someone reported off line that they are experiencing the same problem while running the same versions we currently are.

So just for completeness sake: expecting problems, I also upped the clumanager log levels during the last maintenance window. They are now at:
   clumembd   loglevel="6"
   cluquorumd loglevel="6"
   clurmtabd  loglevel="7"
   clusvcmgrd loglevel="6"
   clulockd   loglevel="6"

Come to think of it, I probably loosened the log levels during the
maintenance window when our problems began (I wanted to reduce the size
of the logs).  Not sure how - or even if - this might affect things, though.

Kind regards,

Herta

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

--

Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux