Sorry I didn't see this earlier! On Wed, 2006-08-02 at 15:50 +0000, danwest@xxxxxxxxxxx wrote: > It seems like a significant problem to have fence_ipmilan issue a power-off followed by a power-on with a 2 node cluster. Generally, the chances of this occurring are very, very small, though not impossible. However, it could very well be that IPMI hardware modules are slow enough at processing requests that this could pose a problem. What hardware has this happened on? Was ACPI disabled on boot in the host OS (it should be; see below)? > This seems to make a 2-node cluster with ipmi fencing pointless. I'm pretty sure that 'both-nodes-off problem' can only occur if all of the following criteria are met: (a) while using a separate NICs for IPMI and cluster traffic (the recommended configuration), (b) in the event of a network partition, such that both nodes can not see each other but can see each other's IPMI port, and (c) if both nodes send their power-off packets at or near the exact same time. The time window for (c) increases significantly (5+ seconds) if the cluster nodes are enabling ACPI power events on boot. This is one of the reasons why booting with acpi=off is required when using IPMI, iLO, or other integrated power management solutions. If booting with acpi=off, does the problem persist? > It looks like fence_ipmilan needs to support sending a cycle instead of a poweroff than a poweron? The reason fence_ipmilan functions this way (off, status, on) is because that we require a confirmation that the node has lost power. I am not sure that it is possible to confirm the node has rebooted using IPMI. Arguably, it also might not be necessary to make such a confirmation in this particular case. > According to fence_ipmilan.c it looks like cycle is not an option although it is an option for ipmitool. (ipmitool -H <ipaddr> -U <userid> -P <password> chassis power cycle) Looks like you're on the right track. -- Lon -- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster