Hi Guido, On Wed, 2009-07-01 at 13:57 +0200, Guido Günther wrote: > > - Before starting any kind of service, such as rgmanager or gfs*, make > > sure that the fencing configuration is correct. Test by using fence_node > > $nodename. > fence_node node1 > > gives the segfaults at the same locationo as described above which seems > to be the cause of the trouble. (Howvever "fence_ilo -z -l user -p pass > -a iloip" works as expected). > The segfault happens in fence/libfence/agent.c's make_args where the > second XPath lookup (FENCE_DEVICE_ARGS_PATH) returns a bogus (non NULL) > str. Doing this xpath lookup by hand looks fine. So it seems > ccs_get_list is returning corrupted pointers. I've attached the current > clluster.conf. I am having problems to reproduce this problem and I'll need your help. First of all I replicated your configuration: <?xml version="1.0"?> <cluster name="fabbione" config_version="1" alias="fabbione"> <logging debug="on"/> <clusternodes> <clusternode name="node1.foo.bar" votes="1" nodeid="1"> <fence> <method name="1"> <device name="fence1"/> </method> </fence> </clusternode> <clusternode name="node2.foo.bar" votes="1" nodeid="4"> <fence> <method name="1"> <device name="fence2"/> </method> </fence> </clusternode> </clusternodes> <fencedevices> <fencedevice name="node1" agent="fence_virsh" port="fedora-rh-node1" ipaddr="daikengo.int.fabbione.net" login="root" secure="1" identity_file="/root/.ssh/id_rsa"/> <fencedevice name="node2" agent="fence_virsh" port="fedora-rh-node4" ipaddr="daikengo.int.fabbione.net" login="root" secure="1" identity_file="/root/.ssh/id_rsa"/> </fencedevices> </cluster> as you can see node names and fencing methods are the same. I don't have ilo but it shouldn't matter. Now my question is: did you mangle the configuration you sent me manually? because there is no matching entry between device to use for a node and the fencedevices section and I get: [root@node2]# fence_node -vv node1 fence node1 dev 0.0 agent none result: error config agent agent args: fence node1 failed Now if i change device name="fenceX" to name="nodeX" there is a matching and: [root@node2 cluster]# fence_node -vv node1 fence node1 dev 0.0 agent fence_virsh result: success agent args: agent=fence_virsh port=fedora-rh-node1 ipaddr=daikengo.int.fabbione.net login=root secure=1 identity_file=/root/.ssh/id_rsa fence node1 success and I still don't see the segfault... Since you can reproduce the problem regularly I'd really like to see some debugging output of libfence to start with. I'd really appreciate if you could help us. test 1: Please add a bunch fprintf(stderr, to agents.c to see the created XPath queries and the result coming back from libccs. If you could please collect the output and send it to me. test 2: If you could please find: cd = ccs_connect(); (line 287 in agent.c) and right before that add: fullxpath=1; That change will ask libccs to use a different Xpath engine internally. And then re-run test1. This should be able to isolate pretty much the problem and give me enough information to debug the issue. the next question is: are you running on some fancy architecture? Maybe something in that environment is not initialized properly (the garbage string you get back from libccs sounds like that) but on more common arches like x86/x86_64 gcc takes care of that for us.... (really wild guessing but still something to fix!). Thanks Fabio -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster