On Wed, Jul 28, 2010 at 9:44 PM, Köppel Benedikt (LET) <benedikt.koeppel@xxxxxxxxxxx> wrote: > OK, with the help of Andrew, I tried it again. > > Some important logs from the problem: > > ~snip~ > > 1317 Jul 28 00:46:31 pcmknode-1 corosync[2618]: [TOTEM ] A processor failed, forming new configuration. > 1318 Jul 28 00:46:32 pcmknode-1 kernel: dlm: closing connection to node -1147763583 > 1319 Jul 28 00:46:32 pcmknode-1 corosync[2618]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 624: memb=1, new=0, lost=1 > 1320 Jul 28 00:46:32 pcmknode-1 corosync[2618]: [pcmk ] info: pcmk_peer_update: memb: pcmknode-1 3130426497 > 1321 Jul 28 00:46:32 pcmknode-1 corosync[2618]: [pcmk ] info: pcmk_peer_update: lost: pcmknode-2 3147203713 > > ~snip~ > > 1338 Jul 28 00:46:32 pcmknode-1 crmd: [2629]: info: crm_update_peer: Node pcmknode-2: id=3147203713 state=lost (new) addr=r(0) ip(192.168.150.187) votes=1 born=620 seen=620 proc=00000000000000000000000000111312 > 1339 Jul 28 00:46:32 pcmknode-1 crmd: [2629]: info: erase_node_from_join: Removed node pcmknode-2 from join calculations: welcomed=0 itegrated=0 finalized=0 confirmed=1 > 1340 Jul 28 00:46:32 pcmknode-1 crmd: [2629]: info: crm_update_quorum: Updating quorum status to false (call=45) > > ~snip~ > > 1351 Jul 28 00:46:32 pcmknode-1 pengine: [2628]: WARN: pe_fence_node: Node pcmknode-2 will be fenced because it is un-expectedly down > 1352 Jul 28 00:46:32 pcmknode-1 pengine: [2628]: info: determine_online_status_fencing: ha_state=active, ccm_state=false, crm_state=online, join_state=member, expected=member > 1353 Jul 28 00:46:32 pcmknode-1 pengine: [2628]: WARN: determine_online_status: Node pcmknode-2 is unclean > > ~snip~ > > > > > I then removed the LVM from /dev/sdb2 and created the GFS2 right on /dev/sdb2 > (without LVM). That does not solve the problem. > > Starting corosync on only one node works fine, even the GFS2 disk can get > mounted. But as soon as the GFS2 disk will be mounted on the second node, the > node gets fenced immediately. I set WebFSClone to target Stopped, and as soon > as I manually started it again, the node got fenced. > Manually mounting the GFS2 disk (with mount -t gfs2...) on the second node also > causes the STONITH. > > One word about my STONITH: It is SBD which runs via /dev/sdb1. I got the > cluster-glue SRPM from clusterlabs and extracted it, to manually compile SBD > (only SBD, nothing else). I then installed SBD. So my system runs all packages > from those repositories: RHEL 6 beta, EPEL, Clusterlabs. > > > > I monitored the network traffic with tcpdump and analyzed it afterwards with > Wireshark. The two DLMs are communicating, but I don't know if probably > something goes wrong there. I see there a packet going from pcmknode-2 to > pcmknode-1, with that content (by Wireshark): (some lines omitted which I think > are not interesting, can provide them if needed) > Command: message (1) > Message Type: lookup message (11) > External Flags: 0x08, Return the contents of the lock value block > Status: Unknown (0) > Granted Mode: invalid (-1) > Request Mode: exclusive (5) > > And then the response from pcmknode-1 to pcmknode-2: > Command: message (1) > Message Type: request reply (5) > External Flags: 0x08, Return the contents of the lock value block > Status: granted (2) > Granted Mode: exclusive (5) > Request Mode: invalid (-1) > > I wonder why pcmknode-1 says "Granted: exclusive" to pcmknode-2. No idea, I don't have much to do with the DLM. > Immediately after the request reply, the pcmknode-2 writes to log "Now mounting > FS..." and gets fenced and shut down. As I explained on IRC yesterday, the node getting fenced is not the issue here. For some reason mounting the GFS volume is causing the node to fail - this is the root cause. Almost all the gfs and dlm code is shared between cman and pacemaker - so its quite possible that the dlm has an issue. Perhaps file a bug against the dlm. > > > So, is there probably something wrong with the DLM? > > > > Regards, > Benedikt > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster