Re: Halt nodes in cluster with cable disconnect

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/27/2012 03:20 PM, yvette hirth wrote:
> Digimer wrote:
> 
>> You can crash the machine with this;
>>
>> echo c > /proc/sysrq-trigger
> 
> will
> 
> ifconfig ethx down  (where "x" = heartbeat ethernet interface numbah)
> 
> do the same thing?
> 
> yvette

Nope. The scenario is caused by both nodes being alive, but losing the
ability to talk to one another on the storage channel. Whether it is
because a given cable is unplugged or a bad firewall rule, the result is
the same; Both nodes see a failure at the same time and call their fence
handlers at the same time. The one with the sleep will delay, and thus,
always lose (and be the fence victim).

The idea behind sending "c" to sysre-trigger is that it hangs the kernel
entirely. The hung node will no trigger it's fence, or do anything else
for that matter. Meanwhile, the node with the sleep will detect the
fault, call the agent, sleep for a few seconds, then proceed to fence
the hung node. This more accurately simulates an actual fault in the
primary node and confirms that the sleep'ed node will in fact fence
successfully.

-- 
Digimer
E-Mail:              digimer@xxxxxxxxxxx
Papers and Projects: https://alteeve.com

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster


[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux