Fencing and dead locks

Jürgen Ladstätter <info@xxxxxxxxxxxxxxxxxx> · Tue, 2 Dec 2014 11:29:04 +0100

Hi guys,

we’re running a 9 node cluster with 5 gfs2 mounts. The cluster is mainly used for load balancing web based applications. Fencing is done with IPMI and works.
Sometimes one server gets fenced, but after rebooting isn’t able to rejoin the cluster. This triggers higher load and many open processes, leading to another server being fenced. This server then isn’t able to rejoin either and this continues until we lose quorum and have to manually restart the whole cluster.
Sadly this is not reproducible, but it looks like it happens more often when there is more write IO.

Since a whole cluster deadlock kinda removes the sense of a cluster, we’d need some input what we could do or change.
We’re running Centos 6.6, kernel 2.6.32-504.1.3.el6.x86_64

Did anyone of you test gfs2 with centos 7? Any known major bugs that could cause dead locks? 

Thanks in advance, Jürgen

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster