Hi all, With the "iscsi doubt" thread in mind, I thought I'd share an experience I've had twice now with iscsi and RHEL 4 cluster manager. What happens is that an iscsi filesystem which is part of a resource group will become unavailable (in dmesg, you see iscsid lose connection, then attempt to reconnect over and over). However, rgmanager does not seem to detect that the filesystem has disappeared, even though the filesystem is configured in the resource group using the built in "fs" resource agent. When I try to fail the resource group over to another node, rgmanager gets all out of whack and starts reporting bogus information. During the most recent failure, rgmanager crashed on all but two of six total nodes. On the two nodes where it was still running, resource groups showed as starting, stopping, or running on nodes that I'd manually fenced five minutes before. I ended up rebooting all of the servers and bringing them up clean. I also found that the rest of the resource group will start even if iscsid is not running. Which is really weird since all of the rest of the resource group are attached to the iscsi filesystem. e.g. all of the other resource agents/scripts are nested/indented within the "fs" block in cluster.conf. If I understand correctly, that shouldn't be able to happen. I'm going to try a few things: Setting "Continuous=no" for all the iscsi targets in iscsi.conf (disables continuous discovery) Setting self_fence=1 in cluster.conf Setting the recovery policy to "relocate" Any recommendations from the experts? I have a support ticket open with Redhat, but they are still combing through six nodes worth of sosreport files. Cheers -- Charles Riley -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster