Re: [ClusterLabs] dlm_controld and fencing issue

Andrew Beekhof <andrew@xxxxxxxxxxx> · Mon, 13 Apr 2015 13:19:37 +1000

> On 1 Apr 2015, at 11:47 pm, Daniel Dehennin <daniel.dehennin@xxxxxxxxxxxx> wrote:
> 
> Hello,
> 
> On a 4 nodes OpenNebula cluster, running Ubuntu Trusty 14.04.2, with:
> 
> - corosync 2.3.3-1ubuntu1
> - pacemaker 1.1.10+git20130802-1ubuntu2.3
> - dlm 4.0.1-0ubuntu1
> 
> Here is the node list with their IDs, to follow the logs:
> 
> - 1084811137 nebula1
> - 1084811138 nebula2
> - 1084811139 nebula3
> - 1084811140 nebula4 (the actual DC)
> 
> I have an issue where fencing is working but dlm always wait for
> fencing, I needed to manually run “dlm_tool fence_ack 1084811138” this
> morning, here are the logs:
> 
> Apr  1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence status 1084811138 receive 1 from 1084811137 walltime 1427844569 local 50759
> Apr  1 01:29:29 nebula4 kernel: [50799.162381] dlm: closing connection to node 1084811138
> Apr  1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence status 1084811138 receive 1 from 1084811139 walltime 1427844569 local 50759
> Apr  1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence request 1084811138 pid 44527 nodedown time 1427844569 fence_all dlm_stonith
> Apr  1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence result 1084811138 pid 44527 result 1 exit status
> Apr  1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence status 1084811138 receive 1 from 1084811140 walltime 1427844569 local 50759
> Apr  1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence request 1084811138 no actor
> [...]
> Apr  1 01:30:25 nebula4 dlm_controld[6737]: 50815 datastores wait for fencing
> Apr  1 01:30:25 nebula4 dlm_controld[6737]: 50815 clvmd wait for fencing
> 
> 
> The stonith actually worked:
> 
> Apr  1 01:29:30 nebula4 stonith-ng[6486]:   notice: handle_request: Client crmd.6490.2707e557 wants to fence (reboot) 'nebula2' with device '(any)'
> Apr  1 01:29:30 nebula4 stonith-ng[6486]:   notice: initiate_remote_stonith_op: Initiating remote operation reboot for nebula2: 39eaf3a2-d7e0-417d-8a01-d2f373973d6b (0)
> Apr  1 01:29:30 nebula4 stonith-ng[6486]:   notice: can_fence_host_with_device: stonith-nebula1-IPMILAN can not fence nebula2: static-list
> Apr  1 01:29:30 nebula4 stonith-ng[6486]:   notice: can_fence_host_with_device: stonith-nebula2-IPMILAN can fence nebula2: static-list
> Apr  1 01:29:30 nebula4 stonith-ng[6486]:   notice: can_fence_host_with_device: stonith-one-frontend can not fence nebula2: static-list
> Apr  1 01:29:30 nebula4 stonith-ng[6486]:   notice: can_fence_host_with_device: stonith-nebula3-IPMILAN can not fence nebula2: static-list
> Apr  1 01:29:32 nebula4 stonith-ng[6486]:   notice: remote_op_done: Operation reboot of nebula2 by nebula3 for crmd.6490@nebula4.39eaf3a2: OK
> 
> I attache the logs of the DC nebula4 around from 01:29:03, where
> everything worked fine (Got 4 replies, expecting: 4) to a little bit
> after.
> 
> To me, it looks like:
> 
> - dlm ask for fencing directly at 01:29:29, the node was fenced since it
>  had garbage in its /var/log/syslog exactely at 01:29.29, plus its
>  uptime, but did not get a good response
> 
> - pacemaker fence nebula2 at 01:29:30 because it's not part of the
>  cluster anymore (since 01:29:26 [TOTEM ] ... Members left: 1084811138)
>  This fencing works.
> 
> Do you have any idea?

There were two important fixes in this area since 1.1.10

+ David Vossel (1 year, 1 month ago) 054fedf: Fix: stonith_api_time_helper now returns when the most recent fencing operation completed 
+ Andrew Beekhof (1 year, 1 month ago) d9921e5: Fix: Fencing: Pass the correct options when looking up the history by node name 

Whether 'pacemaker 1.1.10+git20130802-1ubuntu2.3’ includes them is anybody’s guess.

> 
> Regards.
> -- 
> Daniel Dehennin
> Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
> Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF
> 
> Apr  1 01:29:03 nebula4 lvm[6759]: Waiting for next pre command
> Apr  1 01:29:03 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0
> Apr  1 01:29:03 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0
> Apr  1 01:29:03 nebula4 lvm[6759]: Send local reply
> Apr  1 01:29:03 nebula4 lvm[6759]: Read on local socket 5, len = 31
> Apr  1 01:29:03 nebula4 lvm[6759]: check_all_clvmds_running
> Apr  1 01:29:03 nebula4 lvm[6759]: down_callback. node 1084811137, state = 3
> Apr  1 01:29:03 nebula4 lvm[6759]: down_callback. node 1084811139, state = 3
> Apr  1 01:29:03 nebula4 lvm[6759]: down_callback. node 1084811138, state = 3
> Apr  1 01:29:03 nebula4 lvm[6759]: down_callback. node 1084811140, state = 3
> Apr  1 01:29:03 nebula4 lvm[6759]: Got pre command condition...
> Apr  1 01:29:03 nebula4 lvm[6759]: Writing status 0 down pipe 13
> Apr  1 01:29:03 nebula4 lvm[6759]: Waiting to do post command - state = 0
> Apr  1 01:29:03 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0
> Apr  1 01:29:03 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0
> Apr  1 01:29:03 nebula4 lvm[6759]: distribute command: XID = 43973, flags=0x0 ()
> Apr  1 01:29:03 nebula4 lvm[6759]: num_nodes = 4
> Apr  1 01:29:03 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218f100. client=0x218eab0, msg=0x218ebc0, len=31, csid=(nil), xid=43973
> Apr  1 01:29:03 nebula4 lvm[6759]: Sending message to all cluster nodes
> Apr  1 01:29:03 nebula4 lvm[6759]: process_work_item: local
> Apr  1 01:29:03 nebula4 lvm[6759]: process_local_command: SYNC_NAMES (0x2d) msg=0x218ed00, msglen =31, client=0x218eab0
> Apr  1 01:29:03 nebula4 lvm[6759]: Syncing device names
> Apr  1 01:29:03 nebula4 lvm[6759]: Reply from node 40a8e784: 0 bytes
> Apr  1 01:29:03 nebula4 lvm[6759]: Got 1 replies, expecting: 4
> Apr  1 01:29:03 nebula4 lvm[6759]: LVM thread waiting for work
> Apr  1 01:29:03 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 0. len 31
> Apr  1 01:29:03 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811138 for 1084811140. len 18
> Apr  1 01:29:03 nebula4 lvm[6759]: Reply from node 40a8e782: 0 bytes
> Apr  1 01:29:03 nebula4 lvm[6759]: Got 2 replies, expecting: 4
> Apr  1 01:29:03 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811140. len 18
> Apr  1 01:29:03 nebula4 lvm[6759]: Reply from node 40a8e783: 0 bytes
> Apr  1 01:29:03 nebula4 lvm[6759]: Got 3 replies, expecting: 4
> Apr  1 01:29:03 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 1084811140. len 18
> Apr  1 01:29:03 nebula4 lvm[6759]: Reply from node 40a8e781: 0 bytes
> Apr  1 01:29:03 nebula4 lvm[6759]: Got 4 replies, expecting: 4
> Apr  1 01:29:03 nebula4 lvm[6759]: Got post command condition...
> Apr  1 01:29:03 nebula4 lvm[6759]: Waiting for next pre command
> Apr  1 01:29:03 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0
> Apr  1 01:29:03 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0
> Apr  1 01:29:03 nebula4 lvm[6759]: Send local reply
> Apr  1 01:29:03 nebula4 lvm[6759]: Read on local socket 5, len = 30
> Apr  1 01:29:03 nebula4 lvm[6759]: Got pre command condition...
> Apr  1 01:29:03 nebula4 lvm[6759]: doing PRE command LOCK_VG 'V_vg-one-2' at 6 (client=0x218eab0)
> Apr  1 01:29:03 nebula4 lvm[6759]: unlock_resource: V_vg-one-2 lockid: 1
> Apr  1 01:29:03 nebula4 lvm[6759]: Writing status 0 down pipe 13
> Apr  1 01:29:03 nebula4 lvm[6759]: Waiting to do post command - state = 0
> Apr  1 01:29:03 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0
> Apr  1 01:29:03 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0
> Apr  1 01:29:03 nebula4 lvm[6759]: distribute command: XID = 43974, flags=0x1 (LOCAL)
> Apr  1 01:29:03 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218ed00. client=0x218eab0, msg=0x218ebc0, len=30, csid=(nil), xid=43974
> Apr  1 01:29:03 nebula4 lvm[6759]: process_work_item: local
> Apr  1 01:29:03 nebula4 lvm[6759]: process_local_command: LOCK_VG (0x33) msg=0x218ed40, msglen =30, client=0x218eab0
> Apr  1 01:29:03 nebula4 lvm[6759]: do_lock_vg: resource 'V_vg-one-2', cmd = 0x6 LCK_VG (UNLOCK|VG), flags = 0x0 ( ), critical_section = 0
> Apr  1 01:29:03 nebula4 lvm[6759]: Invalidating cached metadata for VG vg-one-2
> Apr  1 01:29:03 nebula4 lvm[6759]: Reply from node 40a8e784: 0 bytes
> Apr  1 01:29:03 nebula4 lvm[6759]: Got 1 replies, expecting: 1
> Apr  1 01:29:03 nebula4 lvm[6759]: LVM thread waiting for work
> Apr  1 01:29:03 nebula4 lvm[6759]: Got post command condition...
> Apr  1 01:29:03 nebula4 lvm[6759]: Waiting for next pre command
> Apr  1 01:29:03 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0
> Apr  1 01:29:03 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0
> Apr  1 01:29:03 nebula4 lvm[6759]: Send local reply
> Apr  1 01:29:03 nebula4 lvm[6759]: Read on local socket 5, len = 0
> Apr  1 01:29:03 nebula4 lvm[6759]: EOF on local socket: inprogress=0
> Apr  1 01:29:03 nebula4 lvm[6759]: Waiting for child thread
> Apr  1 01:29:03 nebula4 lvm[6759]: Got pre command condition...
> Apr  1 01:29:03 nebula4 lvm[6759]: Subthread finished
> Apr  1 01:29:03 nebula4 lvm[6759]: Joined child thread
> Apr  1 01:29:03 nebula4 lvm[6759]: ret == 0, errno = 0. removing client
> Apr  1 01:29:03 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218ebc0. client=0x218eab0, msg=(nil), len=0, csid=(nil), xid=43974
> Apr  1 01:29:03 nebula4 lvm[6759]: process_work_item: free fd -1
> Apr  1 01:29:03 nebula4 lvm[6759]: LVM thread waiting for work
> Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811138 for 0. len 31
> Apr  1 01:29:16 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218eea0. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0
> Apr  1 01:29:16 nebula4 lvm[6759]: process_work_item: remote
> Apr  1 01:29:16 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 39602 on node 40a8e782
> Apr  1 01:29:16 nebula4 lvm[6759]: Syncing device names
> Apr  1 01:29:16 nebula4 lvm[6759]: LVM thread waiting for work
> Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 1084811138. len 18
> Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811138. len 18
> Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811138. len 18
> Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 0. len 31
> Apr  1 01:29:16 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218eab0. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0
> Apr  1 01:29:16 nebula4 lvm[6759]: process_work_item: remote
> Apr  1 01:29:16 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 44354 on node 40a8e781
> Apr  1 01:29:16 nebula4 lvm[6759]: Syncing device names
> Apr  1 01:29:16 nebula4 lvm[6759]: LVM thread waiting for work
> Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811137. len 18
> Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811138 for 1084811137. len 18
> Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811137. len 18
> Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811138 for 0. len 31
> Apr  1 01:29:16 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218eea0. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0
> Apr  1 01:29:16 nebula4 lvm[6759]: process_work_item: remote
> Apr  1 01:29:16 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 39605 on node 40a8e782
> Apr  1 01:29:16 nebula4 lvm[6759]: Syncing device names
> Apr  1 01:29:16 nebula4 lvm[6759]: LVM thread waiting for work
> Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 1084811138. len 18
> Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811138. len 18
> Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811138. len 18
> Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 0. len 31
> Apr  1 01:29:16 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218eab0. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0
> Apr  1 01:29:16 nebula4 lvm[6759]: process_work_item: remote
> Apr  1 01:29:16 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 44357 on node 40a8e781
> Apr  1 01:29:16 nebula4 lvm[6759]: Syncing device names
> Apr  1 01:29:16 nebula4 lvm[6759]: LVM thread waiting for work
> Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811137. len 18
> Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811138 for 1084811137. len 18
> Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811137. len 18
> Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811138 for 0. len 31
> Apr  1 01:29:16 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218eea0. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0
> Apr  1 01:29:16 nebula4 lvm[6759]: process_work_item: remote
> Apr  1 01:29:16 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 39608 on node 40a8e782
> Apr  1 01:29:16 nebula4 lvm[6759]: Syncing device names
> Apr  1 01:29:16 nebula4 lvm[6759]: LVM thread waiting for work
> Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 1084811138. len 18
> Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 0. len 31
> Apr  1 01:29:16 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218eab0. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0
> Apr  1 01:29:16 nebula4 lvm[6759]: process_work_item: remote
> Apr  1 01:29:16 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 44360 on node 40a8e781
> Apr  1 01:29:16 nebula4 lvm[6759]: Syncing device names
> Apr  1 01:29:16 nebula4 lvm[6759]: LVM thread waiting for work
> Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811138. len 18
> Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811138. len 18
> Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811137. len 18
> Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811138 for 1084811137. len 18
> Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811137. len 18
> Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 0. len 31
> Apr  1 01:29:16 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218eea0. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0
> Apr  1 01:29:16 nebula4 lvm[6759]: process_work_item: remote
> Apr  1 01:29:16 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 44363 on node 40a8e781
> Apr  1 01:29:16 nebula4 lvm[6759]: Syncing device names
> Apr  1 01:29:16 nebula4 lvm[6759]: LVM thread waiting for work
> Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811137. len 18
> Apr  1 01:29:23 nebula4 lvm[6759]: Got new connection on fd 5
> Apr  1 01:29:23 nebula4 lvm[6759]: Read on local socket 5, len = 30
> Apr  1 01:29:23 nebula4 lvm[6759]: creating pipe, [12, 13]
> Apr  1 01:29:23 nebula4 lvm[6759]: Creating pre&post thread
> Apr  1 01:29:23 nebula4 lvm[6759]: Created pre&post thread, state = 0
> Apr  1 01:29:23 nebula4 lvm[6759]: in sub thread: client = 0x218eab0
> Apr  1 01:29:23 nebula4 lvm[6759]: doing PRE command LOCK_VG 'V_vg-one-0' at 1 (client=0x218eab0)
> Apr  1 01:29:23 nebula4 lvm[6759]: lock_resource 'V_vg-one-0', flags=0, mode=3
> Apr  1 01:29:23 nebula4 lvm[6759]: lock_resource returning 0, lock_id=1
> Apr  1 01:29:23 nebula4 lvm[6759]: Writing status 0 down pipe 13
> Apr  1 01:29:23 nebula4 lvm[6759]: Waiting to do post command - state = 0
> Apr  1 01:29:23 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0
> Apr  1 01:29:23 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0
> Apr  1 01:29:23 nebula4 lvm[6759]: distribute command: XID = 43975, flags=0x1 (LOCAL)
> Apr  1 01:29:23 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218ed00. client=0x218eab0, msg=0x218ebc0, len=30, csid=(nil), xid=43975
> Apr  1 01:29:23 nebula4 lvm[6759]: process_work_item: local
> Apr  1 01:29:23 nebula4 lvm[6759]: process_local_command: LOCK_VG (0x33) msg=0x218ed40, msglen =30, client=0x218eab0
> Apr  1 01:29:23 nebula4 lvm[6759]: do_lock_vg: resource 'V_vg-one-0', cmd = 0x1 LCK_VG (READ|VG), flags = 0x0 ( ), critical_section = 0
> Apr  1 01:29:23 nebula4 lvm[6759]: Invalidating cached metadata for VG vg-one-0
> Apr  1 01:29:23 nebula4 lvm[6759]: Reply from node 40a8e784: 0 bytes
> Apr  1 01:29:23 nebula4 lvm[6759]: Got 1 replies, expecting: 1
> Apr  1 01:29:23 nebula4 lvm[6759]: LVM thread waiting for work
> Apr  1 01:29:23 nebula4 lvm[6759]: Got post command condition...
> Apr  1 01:29:23 nebula4 lvm[6759]: Waiting for next pre command
> Apr  1 01:29:23 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0
> Apr  1 01:29:23 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0
> Apr  1 01:29:23 nebula4 lvm[6759]: Send local reply
> Apr  1 01:29:23 nebula4 lvm[6759]: Read on local socket 5, len = 31
> Apr  1 01:29:23 nebula4 lvm[6759]: check_all_clvmds_running
> Apr  1 01:29:23 nebula4 lvm[6759]: down_callback. node 1084811137, state = 3
> Apr  1 01:29:23 nebula4 lvm[6759]: down_callback. node 1084811139, state = 3
> Apr  1 01:29:23 nebula4 lvm[6759]: down_callback. node 1084811138, state = 3
> Apr  1 01:29:23 nebula4 lvm[6759]: down_callback. node 1084811140, state = 3
> Apr  1 01:29:23 nebula4 lvm[6759]: Got pre command condition...
> Apr  1 01:29:23 nebula4 lvm[6759]: Writing status 0 down pipe 13
> Apr  1 01:29:23 nebula4 lvm[6759]: Waiting to do post command - state = 0
> Apr  1 01:29:23 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0
> Apr  1 01:29:23 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0
> Apr  1 01:29:23 nebula4 lvm[6759]: distribute command: XID = 43976, flags=0x0 ()
> Apr  1 01:29:23 nebula4 lvm[6759]: num_nodes = 4
> Apr  1 01:29:23 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218f100. client=0x218eab0, msg=0x218ebc0, len=31, csid=(nil), xid=43976
> Apr  1 01:29:23 nebula4 lvm[6759]: Sending message to all cluster nodes
> Apr  1 01:29:23 nebula4 lvm[6759]: process_work_item: local
> Apr  1 01:29:23 nebula4 lvm[6759]: process_local_command: SYNC_NAMES (0x2d) msg=0x218ed00, msglen =31, client=0x218eab0
> Apr  1 01:29:23 nebula4 lvm[6759]: Syncing device names
> Apr  1 01:29:23 nebula4 lvm[6759]: Reply from node 40a8e784: 0 bytes
> Apr  1 01:29:23 nebula4 lvm[6759]: Got 1 replies, expecting: 4
> Apr  1 01:29:23 nebula4 lvm[6759]: LVM thread waiting for work
> Apr  1 01:29:26 nebula4 corosync[6411]:   [TOTEM ] A processor failed, forming new configuration.
> Apr  1 01:29:29 nebula4 corosync[6411]:   [TOTEM ] A new membership (192.168.231.129:1204) was formed. Members left: 1084811138
> Apr  1 01:29:29 nebula4 lvm[6759]: confchg callback. 0 joined, 1 left, 3 members
> Apr  1 01:29:29 nebula4 crmd[6490]:  warning: match_down_event: No match for shutdown action on 1084811138
> Apr  1 01:29:29 nebula4 crmd[6490]:   notice: peer_update_callback: Stonith/shutdown of nebula2 not matched
> Apr  1 01:29:29 nebula4 crmd[6490]:   notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> Apr  1 01:29:29 nebula4 corosync[6411]:   [QUORUM] Members[3]: 1084811137 1084811139 1084811140
> Apr  1 01:29:29 nebula4 corosync[6411]:   [MAIN  ] Completed service synchronization, ready to provide service.
> Apr  1 01:29:29 nebula4 crmd[6490]:   notice: crm_update_peer_state: pcmk_quorum_notification: Node nebula2[1084811138] - state is now lost (was member)
> Apr  1 01:29:29 nebula4 crmd[6490]:  warning: match_down_event: No match for shutdown action on 1084811138
> Apr  1 01:29:29 nebula4 pacemakerd[6483]:   notice: crm_update_peer_state: pcmk_quorum_notification: Node nebula2[1084811138] - state is now lost (was member)
> Apr  1 01:29:29 nebula4 crmd[6490]:   notice: peer_update_callback: Stonith/shutdown of nebula2 not matched
> Apr  1 01:29:29 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 0. len 31
> Apr  1 01:29:29 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811137. len 18
> Apr  1 01:29:29 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 0. len 31
> Apr  1 01:29:29 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218ed00. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0
> Apr  1 01:29:29 nebula4 lvm[6759]: process_work_item: remote
> Apr  1 01:29:29 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 43802 on node 40a8e783
> Apr  1 01:29:29 nebula4 lvm[6759]: Syncing device names
> Apr  1 01:29:29 nebula4 lvm[6759]: LVM thread waiting for work
> Apr  1 01:29:29 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811140. len 18
> Apr  1 01:29:29 nebula4 lvm[6759]: Reply from node 40a8e783: 0 bytes
> Apr  1 01:29:29 nebula4 lvm[6759]: Got 2 replies, expecting: 4
> Apr  1 01:29:29 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 1084811140. len 18
> Apr  1 01:29:29 nebula4 lvm[6759]: Reply from node 40a8e781: 0 bytes
> Apr  1 01:29:29 nebula4 lvm[6759]: Got 3 replies, expecting: 4
> Apr  1 01:29:29 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 1084811139. len 18
> Apr  1 01:29:29 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811139. len 18
> Apr  1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence status 1084811138 receive 1 from 1084811137 walltime 1427844569 local 50759
> Apr  1 01:29:29 nebula4 kernel: [50799.162381] dlm: closing connection to node 1084811138
> Apr  1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence status 1084811138 receive 1 from 1084811139 walltime 1427844569 local 50759
> Apr  1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence request 1084811138 pid 44527 nodedown time 1427844569 fence_all dlm_stonith
> Apr  1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence result 1084811138 pid 44527 result 1 exit status
> Apr  1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence status 1084811138 receive 1 from 1084811140 walltime 1427844569 local 50759
> Apr  1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence request 1084811138 no actor
> Apr  1 01:29:30 nebula4 pengine[6489]:  warning: pe_fence_node: Node nebula2 will be fenced because the node is no longer part of the cluster
> Apr  1 01:29:30 nebula4 pengine[6489]:  warning: determine_online_status: Node nebula2 is unclean
> Apr  1 01:29:30 nebula4 pengine[6489]:  warning: unpack_rsc_op: Processing failed op start for stonith-nebula4-IPMILAN on nebula3: unknown error (1)
> Apr  1 01:29:30 nebula4 pengine[6489]:  warning: unpack_rsc_op: Processing failed op start for stonith-nebula4-IPMILAN on nebula1: unknown error (1)
> Apr  1 01:29:30 nebula4 pengine[6489]:  warning: unpack_rsc_op: Processing failed op start for stonith-nebula4-IPMILAN on nebula2: unknown error (1)
> Apr  1 01:29:30 nebula4 pengine[6489]:  warning: common_apply_stickiness: Forcing stonith-nebula4-IPMILAN away from nebula1 after 1000000 failures (max=1000000)
> Apr  1 01:29:30 nebula4 pengine[6489]:  warning: common_apply_stickiness: Forcing stonith-nebula4-IPMILAN away from nebula2 after 1000000 failures (max=1000000)
> Apr  1 01:29:30 nebula4 pengine[6489]:  warning: common_apply_stickiness: Forcing stonith-nebula4-IPMILAN away from nebula3 after 1000000 failures (max=1000000)
> Apr  1 01:29:30 nebula4 pengine[6489]:  warning: custom_action: Action p_dlm:3_stop_0 on nebula2 is unrunnable (offline)
> Apr  1 01:29:30 nebula4 pengine[6489]:  warning: custom_action: Action p_dlm:3_stop_0 on nebula2 is unrunnable (offline)
> Apr  1 01:29:30 nebula4 pengine[6489]:  warning: custom_action: Action p_clvm:3_stop_0 on nebula2 is unrunnable (offline)
> Apr  1 01:29:30 nebula4 pengine[6489]:  warning: custom_action: Action p_clvm:3_stop_0 on nebula2 is unrunnable (offline)
> Apr  1 01:29:30 nebula4 pengine[6489]:  warning: custom_action: Action p_vg_one:3_stop_0 on nebula2 is unrunnable (offline)
> Apr  1 01:29:30 nebula4 pengine[6489]:  warning: custom_action: Action p_vg_one:3_stop_0 on nebula2 is unrunnable (offline)
> Apr  1 01:29:30 nebula4 pengine[6489]:  warning: custom_action: Action p_fs_one-datastores:3_stop_0 on nebula2 is unrunnable (offline)
> Apr  1 01:29:30 nebula4 pengine[6489]:  warning: custom_action: Action p_fs_one-datastores:3_stop_0 on nebula2 is unrunnable (offline)
> Apr  1 01:29:30 nebula4 pengine[6489]:  warning: custom_action: Action stonith-nebula1-IPMILAN_stop_0 on nebula2 is unrunnable (offline)
> Apr  1 01:29:30 nebula4 pengine[6489]:  warning: stage6: Scheduling Node nebula2 for STONITH
> Apr  1 01:29:30 nebula4 pengine[6489]:   notice: LogActions: Stop    p_dlm:3#011(nebula2)
> Apr  1 01:29:30 nebula4 pengine[6489]:   notice: LogActions: Stop    p_clvm:3#011(nebula2)
> Apr  1 01:29:30 nebula4 pengine[6489]:   notice: LogActions: Stop    p_vg_one:3#011(nebula2)
> Apr  1 01:29:30 nebula4 pengine[6489]:   notice: LogActions: Stop    p_fs_one-datastores:3#011(nebula2)
> Apr  1 01:29:30 nebula4 pengine[6489]:   notice: LogActions: Move    stonith-nebula1-IPMILAN#011(Started nebula2 -> nebula3)
> Apr  1 01:29:30 nebula4 pengine[6489]:  warning: process_pe_message: Calculated Transition 101: /var/lib/pacemaker/pengine/pe-warn-22.bz2
> Apr  1 01:29:30 nebula4 crmd[6490]:   notice: te_fence_node: Executing reboot fencing operation (98) on nebula2 (timeout=30000)
> Apr  1 01:29:30 nebula4 stonith-ng[6486]:   notice: handle_request: Client crmd.6490.2707e557 wants to fence (reboot) 'nebula2' with device '(any)'
> Apr  1 01:29:30 nebula4 stonith-ng[6486]:   notice: initiate_remote_stonith_op: Initiating remote operation reboot for nebula2: 39eaf3a2-d7e0-417d-8a01-d2f373973d6b (0)
> Apr  1 01:29:30 nebula4 stonith-ng[6486]:   notice: can_fence_host_with_device: stonith-nebula1-IPMILAN can not fence nebula2: static-list
> Apr  1 01:29:30 nebula4 stonith-ng[6486]:   notice: can_fence_host_with_device: stonith-nebula2-IPMILAN can fence nebula2: static-list
> Apr  1 01:29:30 nebula4 stonith-ng[6486]:   notice: can_fence_host_with_device: stonith-one-frontend can not fence nebula2: static-list
> Apr  1 01:29:30 nebula4 stonith-ng[6486]:   notice: can_fence_host_with_device: stonith-nebula3-IPMILAN can not fence nebula2: static-list
> Apr  1 01:29:32 nebula4 stonith-ng[6486]:   notice: remote_op_done: Operation reboot of nebula2 by nebula3 for crmd.6490@nebula4.39eaf3a2: OK
> Apr  1 01:29:32 nebula4 crmd[6490]:   notice: tengine_stonith_callback: Stonith operation 2/98:101:0:28913388-04df-49cb-9927-362b21a74014: OK (0)
> Apr  1 01:29:32 nebula4 crmd[6490]:   notice: tengine_stonith_notify: Peer nebula2 was terminated (reboot) by nebula3 for nebula4: OK (ref=39eaf3a2-d7e0-417d-8a01-d2f373973d6b) by client crmd.6490
> Apr  1 01:29:32 nebula4 crmd[6490]:   notice: te_rsc_command: Initiating action 91: start stonith-nebula1-IPMILAN_start_0 on nebula3
> Apr  1 01:29:33 nebula4 crmd[6490]:   notice: run_graph: Transition 101 (Complete=13, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-22.bz2): Stopped
> Apr  1 01:29:33 nebula4 pengine[6489]:  warning: unpack_rsc_op: Processing failed op start for stonith-nebula4-IPMILAN on nebula3: unknown error (1)
> Apr  1 01:29:33 nebula4 pengine[6489]:  warning: unpack_rsc_op: Processing failed op start for stonith-nebula4-IPMILAN on nebula1: unknown error (1)
> Apr  1 01:29:33 nebula4 pengine[6489]:  warning: common_apply_stickiness: Forcing stonith-nebula4-IPMILAN away from nebula1 after 1000000 failures (max=1000000)
> Apr  1 01:29:33 nebula4 pengine[6489]:  warning: common_apply_stickiness: Forcing stonith-nebula4-IPMILAN away from nebula3 after 1000000 failures (max=1000000)
> Apr  1 01:29:33 nebula4 pengine[6489]:   notice: process_pe_message: Calculated Transition 102: /var/lib/pacemaker/pengine/pe-input-129.bz2
> Apr  1 01:29:33 nebula4 crmd[6490]:   notice: te_rsc_command: Initiating action 88: monitor stonith-nebula1-IPMILAN_monitor_3600000 on nebula3
> Apr  1 01:29:34 nebula4 crmd[6490]:   notice: run_graph: Transition 102 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-129.bz2): Complete
> Apr  1 01:29:34 nebula4 crmd[6490]:   notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> Apr  1 01:30:01 nebula4 CRON[44640]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then munin-run apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then munin-run apt update 7200 12 >/dev/null; fi)
> Apr  1 01:30:25 nebula4 dlm_controld[6737]: 50815 datastores wait for fencing
> Apr  1 01:30:25 nebula4 dlm_controld[6737]: 50815 clvmd wait for fencing
> Apr  1 01:30:29 nebula4 lvm[6759]: Request timed-out (send: 1427844563, now: 1427844629)
> Apr  1 01:30:29 nebula4 lvm[6759]: Request timed-out. padding
> Apr  1 01:30:29 nebula4 lvm[6759]: down_callback. node 1084811137, state = 3
> Apr  1 01:30:29 nebula4 lvm[6759]: Checking for a reply from 40a8e781
> Apr  1 01:30:29 nebula4 lvm[6759]: down_callback. node 1084811139, state = 3
> Apr  1 01:30:29 nebula4 lvm[6759]: Checking for a reply from 40a8e783
> Apr  1 01:30:29 nebula4 lvm[6759]: down_callback. node 1084811138, state = 1
> Apr  1 01:30:29 nebula4 lvm[6759]: down_callback. node 1084811140, state = 3
> Apr  1 01:30:29 nebula4 lvm[6759]: Checking for a reply from 40a8e784
> Apr  1 01:30:29 nebula4 lvm[6759]: Got post command condition...
> Apr  1 01:30:29 nebula4 lvm[6759]: Waiting for next pre command
> Apr  1 01:30:29 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0
> Apr  1 01:30:29 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0
> Apr  1 01:30:29 nebula4 lvm[6759]: Send local reply
> Apr  1 01:30:29 nebula4 lvm[6759]: Read on local socket 5, len = 30
> Apr  1 01:30:29 nebula4 lvm[6759]: Got pre command condition...
> Apr  1 01:30:29 nebula4 lvm[6759]: doing PRE command LOCK_VG 'V_vg-one-0' at 6 (client=0x218eab0)
> Apr  1 01:30:29 nebula4 lvm[6759]: unlock_resource: V_vg-one-0 lockid: 1
> Apr  1 01:40:01 nebula4 CRON[47640]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then munin-run apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then munin-run apt update 7200 12 >/dev/null; fi)
> Apr  1 01:44:34 nebula4 crmd[6490]:   notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ]
> Apr  1 01:44:34 nebula4 pengine[6489]:  warning: unpack_rsc_op: Processing failed op start for stonith-nebula4-IPMILAN on nebula3: unknown error (1)
> Apr  1 01:44:34 nebula4 pengine[6489]:  warning: unpack_rsc_op: Processing failed op start for stonith-nebula4-IPMILAN on nebula1: unknown error (1)
> Apr  1 01:44:34 nebula4 pengine[6489]:  warning: common_apply_stickiness: Forcing stonith-nebula4-IPMILAN away from nebula1 after 1000000 failures (max=1000000)
> Apr  1 01:44:34 nebula4 pengine[6489]:  warning: common_apply_stickiness: Forcing stonith-nebula4-IPMILAN away from nebula3 after 1000000 failures (max=1000000)
> Apr  1 01:44:34 nebula4 pengine[6489]:   notice: process_pe_message: Calculated Transition 103: /var/lib/pacemaker/pengine/pe-input-130.bz2
> Apr  1 01:44:34 nebula4 crmd[6490]:   notice: run_graph: Transition 103 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-130.bz2): Complete
> Apr  1 01:44:34 nebula4 crmd[6490]:   notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> Apr  1 01:45:01 nebula4 CRON[49089]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then munin-run apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then munin-run apt update 7200 12 >/dev/null; fi)
> Apr  1 01:46:01 nebula4 CRON[570]: (root) CMD (if test -x /usr/sbin/apticron; then /usr/sbin/apticron --cron; else true; fi)
> Apr  1 01:49:20 nebula4 lvm[6759]: Got new connection on fd 17
> Apr  1 01:49:20 nebula4 lvm[6759]: Read on local socket 17, len = 30
> Apr  1 01:49:20 nebula4 lvm[6759]: creating pipe, [18, 19]
> Apr  1 01:49:20 nebula4 lvm[6759]: Creating pre&post thread
> Apr  1 01:49:20 nebula4 lvm[6759]: Created pre&post thread, state = 0
> Apr  1 01:49:20 nebula4 lvm[6759]: in sub thread: client = 0x218f1f0
> Apr  1 01:49:20 nebula4 lvm[6759]: doing PRE command LOCK_VG 'V_vg-one-0' at 1 (client=0x218f1f0)
> Apr  1 01:49:20 nebula4 lvm[6759]: lock_resource 'V_vg-one-0', flags=0, mode=3
> _______________________________________________
> Users mailing list: Users@xxxxxxxxxxxxxxx
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> -- 
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster