Using a 14-node cluster on CentOS 5.2 with GFS1. We've observed a problem in production that caused us to peform an unplanned cluster restart. We also reproduced similar behavior in a lab environment. If one node loses its connection to shared storage, it can no longer perform any filesystem activity. The GFS filesystem may decide to withdraw. That's expected. The same node that withdraws does not get fenced. Since the cluster itself depends on networking and not storage, and cluster services other than GFS may be active, that's not surprising. When one node withdraws or otherwise fails on a GFS mount without getting fenced, other nodes freeze when attempting to access the same filesystem. That's unexpected. For a high-availabliity cluster, this can be a bad thing, because it isn't handled automatically and effectively causes a cluster-wide outage. Does this sound right? How can we mitigate or prevent such outages? Are there relevant configuration settings I've missed? Thanks for any insight. Jeff -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster