qdisk questions

denis <denisb+gmane@xxxxxxxxx> · Thu, 02 Oct 2008 10:16:08 +0200

Hi,

I have recently had a couple of situations with my cluster where both
nodes were restarted simultaneously. The reasons for this are a bit
beyond me so I was wondering if anyone could clarify / point me to
relevant documentation.

Following excerpts from both nodes logs :

Oct  2 08:32:22 node1 qdiskd[3758]: <info> Heuristic: 'ping 10.X.X.X -c1
-t2' DOWN (3/3)
Oct  2 08:32:39 node1 qdiskd[3758]: <info> Heuristic: 'ping X.X.X.X -c1
-t2' DOWN (6/6)
Oct  2 08:32:55 node1 qdiskd[3758]: <info> Heuristic: 'ping X.X.X.X -c1
-t2' DOWN (6/6)
Oct  2 08:32:58 node1 qdiskd[3758]: <info> Heuristic: 'ping X.X.X.X -c1
-t1' DOWN (6/6)
Oct  2 08:33:01 node1 qdiskd[3758]: <notice> Score insufficient for
master operation (0/4; required=1); downgrading
Oct  2 08:33:01 node1 kernel: md: stopping all md devices.

Oct  2 08:32:23 node2 qdiskd[3599]: <info> Heuristic: 'ping 10.X.X.X -c1
-t2' DOWN (3/3)
Oct  2 08:32:49 node2 qdiskd[3599]: <info> Heuristic: 'ping X.X.X.X -c1
-t2' DOWN (6/6)
Oct  2 08:32:56 node2 qdiskd[3599]: <info> Heuristic: 'ping X.X.X.X -c1
-t1' DOWN (6/6)
Oct  2 08:32:56 node2 qdiskd[3599]: <info> Heuristic: 'ping X.X.X.X -c1
-t2' DOWN (6/6)
Oct  2 08:33:03 node2 qdiskd[3599]: <notice> Score insufficient for
master operation (0/4; required=1); downgrading
Oct  2 08:33:03 node2 kernel: md: stopping all md devices.

Does qdisk reboot the node due to these tests failing?

The upstream routers these nodes are connected to were unavailable for
at most 2 minutes, and all four pingtests require connectivity through
the router (probably need to change that!?).

What kind of tests can I use for qdiskd that will prevent router-outages
 from killing my cluster completely?

Regards
--
Denis

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster