On Wed, 2006-10-18 at 21:38 +0200, Katriel Traum wrote: > The (ugly) workaround I've been using is killing the process manually > and then manually removing /var/lock/subsys/rgmanager, which causes "rc" > to skip it. > Is there a better way to restart a failed node? Shouldn't a failed node > be "hard booted" by cman? Nodes don't "know" they're fenced with fabric-level fencing; it's a deficiency in the model itself. The easiest thing to do is 'reboot -fn'. A fenced node may have outstanding buffers which never get cleaned up - so you can't "un-fence" them until they have been rebooted anyway. Rgmanager's child processes are probably trying to umount the a file system that has been fenced and are stuck in disk-wait - which may be "forever", depending on the storage configuration. There's an patch outstanding for qdiskd which makes it reboot on loss of score, which triggers a reboot. However, I don't think this is your problem. -- Lon -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster