On Wed, 2008-05-28 at 12:57 +1100, Darrin De Groot wrote: > > Hi, > > I am running a 4 node cluster with a multipathed quorum disk, > configured to use the path /dev/dm-1. The problem that I am having is > that if I lose one path to the disk (am testing by pulling one fibre), > the node is almost always fenced (one node, once, managed to stay up, > out of more than 10 attempts). Is there some timeout that needs > changing to give qdiskd the time to realise that a path is down? I > have tried an interval of 3 seconds with at TKO of 10, with no > success, and a token timeout set at 45000ms: > > <totem consensus="4800" join="60" token="45000" > token_retransmits_before_loss_const="20"/> > <quorumd device="/dev/dm-1" interval="3" min_score="1" > tko="10" votes="3"/> > As a general rule, you want qdiskd's timeout to exceed the path failover time with some time for the I/Os to get out after a path failover completes. As a general rule of thumb, totem's token timeout needs to approximately double the qdisk timeout. E.g.: <totem token="120000" ... /> <quorumd device="/dev/dm-1" interval="3" min_score="1" tko="20" votes="3" /> [Note: Obviously, I think qdiskd should algorithmically determine fairly optimial timings based on the totem token timeout in the future. ] -- Lon -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster