----- "Alan A" <alan.zg@xxxxxxxxx> wrote: | This is worse than I tought. The entire cluster is hanging upon | restart | command issued from the Conga - lucy box. I tried bringing gfs service | down | on node2 (lucy) with the: service gfs stop (we are not running | rgmanager), | and I got: | FATAL: Module gfs is in use. Hi Alan, It sounds like conga can't reboot the cluster because the GFS file system is still mounted, or is in use. I don't know much about conga, so forgive my ignorance there. You may need to unmount the gfs file system before you reboot. The dmesg you sent looked perfectly normal to me. Those are normal openais messages. I'm more interested to see if there were any "File system withdrawn" messages, general protection faults, or kernel panic messages or other serious kernel errors on any of the nodes in the cluster just around the time of the first failure. This is just a wild guess, but I'm guessing that there was some kind of error, like a kernel panic that occurred a while back. That caused the node to be fenced. Perhaps the SCSI fencing locked up the device somehow so none of the nodes can use it. If that's the case, you should be able to log in to each of the nodes, unmount the gfs file systems that are mounted, manually, and then reboot them. If it doesn't let you unmount them, it might be because some process is still using the GFS file system. For example, if you're using NFS to export the GFS file system, you probably need to do service nfs stop before it will let you unmount the gfs, then reboot. So I would comb through the /var/log/messages of each node looking for an error message regarding the node being fenced, withdrawn, panic, SCSI errors, or any kind of serious errors that occurred around the time where you first had the problem. Regards, Bob Peterson Red Hat Clustering & GFS -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster