Re: Waiting for fenced to join the fence group

Wolfgang Hotwagner <listener@xxxxxxxxx> · Tue, 28 Jul 2009 10:54:11 +0200

No ideas?

Wolfgang Hotwagner wrote:
> Hello,
>
> i am not able to make a gfs2-cluster on a drbd-device. I always have the
> problem with joining the fence group. I am using a debian stable(lenny)
> system. On eth0 there is also a ctdb-service which enables 2 additional
> ip's. Maybe someone could help me to get it working..
>
> Greetings
> Wolfgang
>
>
>
>
>
>
>
> dslin1:
> eth0: 172.30.50.83
> eth1: 10.13.13.2
>
> /etc/hosts:
> 127.0.0.1       localhost
> 172.30.50.83    dslin1
> 172.30.50.84    dslin2
> 10.13.13.2      node1
> 10.13.13.3      node2
>
>
> /proc/drbd:
> version: 8.0.14 (api:86/proto:86)
> GIT-hash: bb447522fc9a87d0069b7e14f0234911ebdab0f7 build by
> phil@fat-tyre, 2008-11-12 16:40:33
>  0: cs:Connected st:Primary/Primary ds:UpToDate/UpToDate C r---
>     ns:0 nr:12288 dw:12288 dr:0 al:0 bm:3 lo:0 pe:0 ua:0 ap:0
>         resync: used:0/61 hits:765 misses:3 starving:0 dirty:0 changed:3
>         act_log: used:0/127 hits:0 misses:0 starving:0 dirty:0 changed:0
>
>
>
> syslog:
> Jul 21 13:38:27 dslin1 ccsd[14975]: Starting ccsd 2.03.09:
> Jul 21 13:38:27 dslin1 ccsd[14975]:  Built: Nov  3 2008 18:22:21
> Jul 21 13:38:27 dslin1 ccsd[14975]:  Copyright (C) Red Hat, Inc.
> 2004-2008  All rights reserved.
> Jul 21 13:38:28 dslin1 ccsd[14975]: /etc/cluster/cluster.conf (cluster
> name = cluster, version = 1) found.
> Jul 21 13:38:31 dslin1 ccsd[14975]: Initial status:: Quorate
> Jul 21 13:38:35 dslin1 openais[14980]: cman killed by node 2 because we
> rejoined the cluster without a full restart
> Jul 21 13:38:35 dslin1 groupd[14984]: cman_get_nodes error -1 104
> Jul 21 13:38:35 dslin1 gfs_controld[14992]: cluster is down, exiting
> Jul 21 13:39:00 dslin1 ccsd[14975]: Unable to connect to cluster
> infrastructure after 30 seconds.
> Jul 21 13:39:30 dslin1 ccsd[14975]: Unable to connect to cluster
> infrastructure after 60 seconds.
> Jul 21 13:40:00 dslin1 ccsd[14975]: Unable to connect to cluster
> infrastructure after 90 seconds.
> Jul 21 13:40:30 dslin1 ccsd[14975]: Unable to connect to cluster
> infrastructure after 120 seconds.
> and so on..
>
>
>
>
>
> dslin2:
> eth0: 172.30.50.84
> eth1: 10.13.13.3
>
> /etc/hosts:
> 127.0.0.1       localhost
> 172.30.50.83    dslin1
> 172.30.50.84    dslin2
> 10.13.13.2      node1
> 10.13.13.3      node2
>
>
> /proc/drbd
> version: 8.0.14 (api:86/proto:86)
> GIT-hash: bb447522fc9a87d0069b7e14f0234911ebdab0f7 build by
> phil@fat-tyre, 2008-11-12 16:40:33
>  0: cs:Connected st:Primary/Primary ds:UpToDate/UpToDate C r---
>     ns:12292 nr:0 dw:0 dr:12296 al:0 bm:6 lo:0 pe:0 ua:0 ap:0
>         resync: used:0/61 hits:765 misses:3 starving:0 dirty:0 changed:3
>         act_log: used:0/127 hits:0 misses:0 starving:0 dirty:0 changed:0
>
>
> syslog:
> Jul 21 13:38:27 dslin1 ccsd[14975]: Starting ccsd 2.03.09:
> Jul 21 13:38:27 dslin1 ccsd[14975]:  Built: Nov  3 2008 18:22:21
> Jul 21 13:38:27 dslin1 ccsd[14975]:  Copyright (C) Red Hat, Inc.
> 2004-2008  All rights reserved.
> Jul 21 13:38:28 dslin1 ccsd[14975]: /etc/cluster/cluster.conf (cluster
> name = cluster, version = 1) found.
> Jul 21 13:38:31 dslin1 ccsd[14975]: Initial status:: Quorate
> Jul 21 13:38:35 dslin1 openais[14980]: cman killed by node 2 because we
> rejoined the cluster without a full restart
> Jul 21 13:38:35 dslin1 groupd[14984]: cman_get_nodes error -1 104
> Jul 21 13:38:35 dslin1 gfs_controld[14992]: cluster is down, exiting
> Jul 21 13:39:00 dslin1 ccsd[14975]: Unable to connect to cluster
> infrastructure after 30 seconds.
> Jul 21 13:39:30 dslin1 ccsd[14975]: Unable to connect to cluster
> infrastructure after 60 seconds.
> Jul 21 13:40:00 dslin1 ccsd[14975]: Unable to connect to cluster
> infrastructure after 90 seconds.
> Jul 21 13:40:30 dslin1 ccsd[14975]: Unable to connect to cluster
> infrastructure after 120 seconds.
> Jul 21 13:41:00 dslin1 ccsd[14975]: Unable to connect to cluster
> infrastructure after 150 seconds.
> Jul 21 13:41:30 dslin1 ccsd[14975]: Unable to connect to cluster
> infrastructure after 180 seconds.
> Jul 21 13:42:00 dslin1 ccsd[14975]: Unable to connect to cluster
> infrastructure after 210 seconds.
> and so on..
>
>
>
>
>
>
>
> /etc/cluster/cluster.conf:
> <?xml version="1.0"?>
> <cluster name="cluster" config_version="1">
>   <!-- post_join_delay: number of seconds the daemon will wait before
>                         fencing any victims after a node joins the domain
>        post_fail_delay: number of seconds the daemon will wait before
>                         fencing any victims after a domain member fails
>        clean_start    : prevent any startup fencing the daemon might do.
>                         It indicates that the daemon should assume all nodes
>                         are in a clean state to start. -->
>   <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>   <clusternodes>
>     <clusternode name="dslin1" votes="1" nodeid="1">
>       <fence>
>         <!-- Handle fencing manually -->
>         <method name="human">
>           <device name="human" nodename="dslin1" ipaddr="10.13.13.2"/>
>         </method>
>       </fence>
>     </clusternode>
>     <clusternode name="dslin2" votes="1" nodeid="2">
>       <fence>
>         <!-- Handle fencing manually -->
>         <method name="human">
>           <device name="human" nodename="dslin2" ipaddr="10.13.13.3"/>
>         </method>
>       </fence>
>     </clusternode>
>   </clusternodes>
>   <!-- cman two nodes specification -->
>   <cman expected_votes="1" two_node="1"/>
>   <fencedevices>
>     <!-- Define manual fencing -->
>     <fencedevice name="human" agent="fence_manual"/>
>   </fencedevices>
> </cluster>
>
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster
>   

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster