I can pinpoint the problem with verbose logging. For some reason,
fence_apc repeats the outlet selection menu option. On an outlet number
> 2, this is harmless, but in the case where the outlet <= 2, the
script horks. Here is the output on a working call to illustrate the
duplicate selection; "13" is entered twice:
^M------- Outlet Control/Configuration
------------------------------------------
1- Outlet 1 ON
2- build ON
3- www103 ON
4- www102 ON
5- Outlet 5 ON
6- Outlet 6 ON
7- Outlet 7 ON
8- fs102 ON
9- build ON
10- app102 ON
11- Outlet 11 ON
12- db103 ON
13- fs103 ON
14- Outlet 14 ON
15- Outlet 15 ON
16- Outlet 16 ON
17- Master Control/Configuration
<ESC>- Back, <ENTER>- Refresh, <CTRL-L>- Event Log
> 13
^M------- fs103
-----------------------------------------------------------------
Name : fs103
Outlet : 13
State : ON
1- Control Outlet
2- Configure Outlet
?- Help, <ESC>- Back, <ENTER>- Refresh, <CTRL-L>- Event Log
> 13
^M------- fs103
-----------------------------------------------------------------
Name : fs103
Outlet : 13
State : ON
1- Control Outlet
2- Configure Outlet
?- Help, <ESC>- Back, <ENTER>- Refresh, <CTRL-L>- Event Log
> 1
Matt Harrington wrote:
I am encountering an unknown screen exception from fence_apc when
trying to fence a system in a 3-node cluster (centos5.2
cman-2.0.84-2.el5). What is interesting, is that I can fence the
other two nodes in my cluster. I believe the difference is that the
problem node has two power supplies which means that fence_apc is
called with off/on instead of restart. This also requires connecting
to two different pdus. It could also be that there is something wrong
with the config which was taken from an older system and updated with
luci. I am unable to descern any differences between the menus of the
two pdus.
[root@fs102 ~]# /sbin/fence_node fs103
agent "fence_apc" reports: Traceback (most recent call last):
File "/sbin/fence_apc", line 829, in ?
main()
File "/sbin/fence_apc", line 303, in main
do_power_off(sock)
File "/sbin/fence_apc", line 813, in do_power_off
x = do_power_switch(sock, "off")
File "/sbi
agent "fence_apc" reports: n/fence_apc", line 611, in do_power_switch
result_code, response = power_off(txt + ndbuf)
File "/sbin/fence_apc", line 817, in power_off
x = power_switch(buffer, False, "2", "3");
File "/sbin/fence_apc", line 810, in power_switch
raise "un
agent "fence_apc" reports: known screen encountered in \n" +
str(lines) + "\n"
unknown screen encountered in
['', '> 2', '', '', '------- Configure Outlet
------------------------------------------------------', '', ' #
State Ph Name Pwr On Dly Pwr Off D
agent "fence_apc" reports: ly Reboot Dur.', '
----------------------------------------------------------------------------',
' 2 ON 1 fs103 0 sec 0 sec 5
sec', '', ' 1- Outlet Name : fs103', ' 2- Power On
Delay(sec) : 0',
agent "fence_apc" reports: ' 3- Power Off Delay(sec): 0', '
4- Reboot Duration(sec): 5', ' 5- Accept Changes : ', '',
' ?- Help, <ESC>- Back, <ENTER>- Refresh, <CTRL-L>- Event Log']
[root@fs102 ~]# /sbin/fence_apc -a 10.10.1.200 -l pdu -p pdu -n 13 -o
status
Status check successful. Port 13 is OFF
[root@fs102 ~]# /sbin/fence_apc -a 10.10.1.201 -l pdu -p pdu -n 2 -o
status
Status check successful. Port 2 is ON
[root@fs102 ~]# /sbin/fence_apc -a 10.10.1.201 -l pdu -p pdu -n 2 -o off
Traceback (most recent call last):
File "/sbin/fence_apc", line 829, in ?
main()
File "/sbin/fence_apc", line 303, in main
do_power_off(sock)
File "/sbin/fence_apc", line 813, in do_power_off
x = do_power_switch(sock, "off")
File "/sbin/fence_apc", line 611, in do_power_switch
result_code, response = power_off(txt + ndbuf)
File "/sbin/fence_apc", line 817, in power_off
x = power_switch(buffer, False, "2", "3");
File "/sbin/fence_apc", line 810, in power_switch
raise "unknown screen encountered in \n" + str(lines) + "\n"
unknown screen encountered in
['2', '', '', '------- Configure Outlet
------------------------------------------------------', '', ' #
State Ph Name Pwr On Dly Pwr Off Dly Reboot
Dur.', '
----------------------------------------------------------------------------',
' 2 ON 1 fs103 0 sec 0 sec 5
sec', '', ' 1- Outlet Name : fs103', ' 2- Power On
Delay(sec) : 0', ' 3- Power Off Delay(sec): 0', ' 4- Reboot
Duration(sec): 5', ' 5- Accept Changes : ', '', ' ?-
Help, <ESC>- Back, <ENTER>- Refresh, <CTRL-L>- Event Log']
<cluster config_version="143" name="gfs_cluster">
<fence_daemon clean_start="0" post_fail_delay="0"
post_join_delay="3"/>
<clusternodes>
<clusternode name="fs101" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="pdu102.eons.dev" port="12"/>
</method>
</fence>
</clusternode>
<clusternode name="fs102" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="pdu101.eons.dev" port="8"/>
</method>
</fence>
</clusternode>
<clusternode name="fs103" nodeid="3" votes="1">
<fence>
<method name="1">
<device name="pdu101.eons.dev" option="off"
port="13"/>
<device name="pdu102.eons.dev" option="off" port="2"/>
<device name="pdu101.eons.dev" option="on" port="13"/>
<device name="pdu102.eons.dev" option="on" port="2"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice agent="fence_apc" ipaddr="10.10.1.200"
login="pdu" name="pdu101.eons.dev" passwd="pdu"/>
<fencedevice agent="fence_apc" ipaddr="10.10.1.201"
login="pdu" name="pdu102.eons.dev" passwd="pdu"/>
</fencedevices>
...
</cluster>
[root@fs102 ~]# cat /etc/redhat-release
CentOS release 5.2 (Final)
[root@fs102 ~]# rpm -qf /sbin/fence_apc
cman-2.0.84-2.el5
[root@fs102 ~]# rpm -q luci
luci-0.12.0-7.el5.centos.3
pdu101:
American Power Conversion Network Management Card
AOS v3.5.9
(c) Copyright 2008 All Rights Reserved Rack PDU
APP v3.5.8
pdu102:
American Power Conversion Network Management Card
AOS v3.5.9
(c) Copyright 2008 All Rights Reserved Rack PDU
APP v3.5.8
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster