[ClusterLabs] dlm_controld and fencing issue

Daniel Dehennin <daniel.dehennin@xxxxxxxxxxxx> · Wed, 01 Apr 2015 14:47:30 +0200

Hello,

On a 4 nodes OpenNebula cluster, running Ubuntu Trusty 14.04.2, with:

- corosync 2.3.3-1ubuntu1
- pacemaker 1.1.10+git20130802-1ubuntu2.3
- dlm 4.0.1-0ubuntu1

Here is the node list with their IDs, to follow the logs:

- 1084811137 nebula1
- 1084811138 nebula2
- 1084811139 nebula3
- 1084811140 nebula4 (the actual DC)

I have an issue where fencing is working but dlm always wait for
fencing, I needed to manually run “dlm_tool fence_ack 1084811138” this
morning, here are the logs:

Apr  1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence status 1084811138 receive 1 from 1084811137 walltime 1427844569 local 50759
Apr  1 01:29:29 nebula4 kernel: [50799.162381] dlm: closing connection to node 1084811138
Apr  1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence status 1084811138 receive 1 from 1084811139 walltime 1427844569 local 50759
Apr  1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence request 1084811138 pid 44527 nodedown time 1427844569 fence_all dlm_stonith
Apr  1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence result 1084811138 pid 44527 result 1 exit status
Apr  1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence status 1084811138 receive 1 from 1084811140 walltime 1427844569 local 50759
Apr  1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence request 1084811138 no actor
[...]
Apr  1 01:30:25 nebula4 dlm_controld[6737]: 50815 datastores wait for fencing
Apr  1 01:30:25 nebula4 dlm_controld[6737]: 50815 clvmd wait for fencing

The stonith actually worked:

Apr  1 01:29:30 nebula4 stonith-ng[6486]:   notice: handle_request: Client crmd.6490.2707e557 wants to fence (reboot) 'nebula2' with device '(any)'
Apr  1 01:29:30 nebula4 stonith-ng[6486]:   notice: initiate_remote_stonith_op: Initiating remote operation reboot for nebula2: 39eaf3a2-d7e0-417d-8a01-d2f373973d6b (0)
Apr  1 01:29:30 nebula4 stonith-ng[6486]:   notice: can_fence_host_with_device: stonith-nebula1-IPMILAN can not fence nebula2: static-list
Apr  1 01:29:30 nebula4 stonith-ng[6486]:   notice: can_fence_host_with_device: stonith-nebula2-IPMILAN can fence nebula2: static-list
Apr  1 01:29:30 nebula4 stonith-ng[6486]:   notice: can_fence_host_with_device: stonith-one-frontend can not fence nebula2: static-list
Apr  1 01:29:30 nebula4 stonith-ng[6486]:   notice: can_fence_host_with_device: stonith-nebula3-IPMILAN can not fence nebula2: static-list
Apr  1 01:29:32 nebula4 stonith-ng[6486]:   notice: remote_op_done: Operation reboot of nebula2 by nebula3 for crmd.6490@nebula4.39eaf3a2: OK

I attache the logs of the DC nebula4 around from 01:29:03, where
everything worked fine (Got 4 replies, expecting: 4) to a little bit
after.

To me, it looks like:

- dlm ask for fencing directly at 01:29:29, the node was fenced since it
  had garbage in its /var/log/syslog exactely at 01:29.29, plus its
  uptime, but did not get a good response

- pacemaker fence nebula2 at 01:29:30 because it's not part of the
  cluster anymore (since 01:29:26 [TOTEM ] ... Members left: 1084811138)
  This fencing works.

Do you have any idea?

Regards.
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF

Apr  1 01:29:03 nebula4 lvm[6759]: Waiting for next pre command
Apr  1 01:29:03 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0
Apr  1 01:29:03 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0
Apr  1 01:29:03 nebula4 lvm[6759]: Send local reply
Apr  1 01:29:03 nebula4 lvm[6759]: Read on local socket 5, len = 31
Apr  1 01:29:03 nebula4 lvm[6759]: check_all_clvmds_running
Apr  1 01:29:03 nebula4 lvm[6759]: down_callback. node 1084811137, state = 3
Apr  1 01:29:03 nebula4 lvm[6759]: down_callback. node 1084811139, state = 3
Apr  1 01:29:03 nebula4 lvm[6759]: down_callback. node 1084811138, state = 3
Apr  1 01:29:03 nebula4 lvm[6759]: down_callback. node 1084811140, state = 3
Apr  1 01:29:03 nebula4 lvm[6759]: Got pre command condition...
Apr  1 01:29:03 nebula4 lvm[6759]: Writing status 0 down pipe 13
Apr  1 01:29:03 nebula4 lvm[6759]: Waiting to do post command - state = 0
Apr  1 01:29:03 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0
Apr  1 01:29:03 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0
Apr  1 01:29:03 nebula4 lvm[6759]: distribute command: XID = 43973, flags=0x0 ()
Apr  1 01:29:03 nebula4 lvm[6759]: num_nodes = 4
Apr  1 01:29:03 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218f100. client=0x218eab0, msg=0x218ebc0, len=31, csid=(nil), xid=43973
Apr  1 01:29:03 nebula4 lvm[6759]: Sending message to all cluster nodes
Apr  1 01:29:03 nebula4 lvm[6759]: process_work_item: local
Apr  1 01:29:03 nebula4 lvm[6759]: process_local_command: SYNC_NAMES (0x2d) msg=0x218ed00, msglen =31, client=0x218eab0
Apr  1 01:29:03 nebula4 lvm[6759]: Syncing device names
Apr  1 01:29:03 nebula4 lvm[6759]: Reply from node 40a8e784: 0 bytes
Apr  1 01:29:03 nebula4 lvm[6759]: Got 1 replies, expecting: 4
Apr  1 01:29:03 nebula4 lvm[6759]: LVM thread waiting for work
Apr  1 01:29:03 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 0. len 31
Apr  1 01:29:03 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811138 for 1084811140. len 18
Apr  1 01:29:03 nebula4 lvm[6759]: Reply from node 40a8e782: 0 bytes
Apr  1 01:29:03 nebula4 lvm[6759]: Got 2 replies, expecting: 4
Apr  1 01:29:03 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811140. len 18
Apr  1 01:29:03 nebula4 lvm[6759]: Reply from node 40a8e783: 0 bytes
Apr  1 01:29:03 nebula4 lvm[6759]: Got 3 replies, expecting: 4
Apr  1 01:29:03 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 1084811140. len 18
Apr  1 01:29:03 nebula4 lvm[6759]: Reply from node 40a8e781: 0 bytes
Apr  1 01:29:03 nebula4 lvm[6759]: Got 4 replies, expecting: 4
Apr  1 01:29:03 nebula4 lvm[6759]: Got post command condition...
Apr  1 01:29:03 nebula4 lvm[6759]: Waiting for next pre command
Apr  1 01:29:03 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0
Apr  1 01:29:03 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0
Apr  1 01:29:03 nebula4 lvm[6759]: Send local reply
Apr  1 01:29:03 nebula4 lvm[6759]: Read on local socket 5, len = 30
Apr  1 01:29:03 nebula4 lvm[6759]: Got pre command condition...
Apr  1 01:29:03 nebula4 lvm[6759]: doing PRE command LOCK_VG 'V_vg-one-2' at 6 (client=0x218eab0)
Apr  1 01:29:03 nebula4 lvm[6759]: unlock_resource: V_vg-one-2 lockid: 1
Apr  1 01:29:03 nebula4 lvm[6759]: Writing status 0 down pipe 13
Apr  1 01:29:03 nebula4 lvm[6759]: Waiting to do post command - state = 0
Apr  1 01:29:03 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0
Apr  1 01:29:03 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0
Apr  1 01:29:03 nebula4 lvm[6759]: distribute command: XID = 43974, flags=0x1 (LOCAL)
Apr  1 01:29:03 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218ed00. client=0x218eab0, msg=0x218ebc0, len=30, csid=(nil), xid=43974
Apr  1 01:29:03 nebula4 lvm[6759]: process_work_item: local
Apr  1 01:29:03 nebula4 lvm[6759]: process_local_command: LOCK_VG (0x33) msg=0x218ed40, msglen =30, client=0x218eab0
Apr  1 01:29:03 nebula4 lvm[6759]: do_lock_vg: resource 'V_vg-one-2', cmd = 0x6 LCK_VG (UNLOCK|VG), flags = 0x0 ( ), critical_section = 0
Apr  1 01:29:03 nebula4 lvm[6759]: Invalidating cached metadata for VG vg-one-2
Apr  1 01:29:03 nebula4 lvm[6759]: Reply from node 40a8e784: 0 bytes
Apr  1 01:29:03 nebula4 lvm[6759]: Got 1 replies, expecting: 1
Apr  1 01:29:03 nebula4 lvm[6759]: LVM thread waiting for work
Apr  1 01:29:03 nebula4 lvm[6759]: Got post command condition...
Apr  1 01:29:03 nebula4 lvm[6759]: Waiting for next pre command
Apr  1 01:29:03 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0
Apr  1 01:29:03 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0
Apr  1 01:29:03 nebula4 lvm[6759]: Send local reply
Apr  1 01:29:03 nebula4 lvm[6759]: Read on local socket 5, len = 0
Apr  1 01:29:03 nebula4 lvm[6759]: EOF on local socket: inprogress=0
Apr  1 01:29:03 nebula4 lvm[6759]: Waiting for child thread
Apr  1 01:29:03 nebula4 lvm[6759]: Got pre command condition...
Apr  1 01:29:03 nebula4 lvm[6759]: Subthread finished
Apr  1 01:29:03 nebula4 lvm[6759]: Joined child thread
Apr  1 01:29:03 nebula4 lvm[6759]: ret == 0, errno = 0. removing client
Apr  1 01:29:03 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218ebc0. client=0x218eab0, msg=(nil), len=0, csid=(nil), xid=43974
Apr  1 01:29:03 nebula4 lvm[6759]: process_work_item: free fd -1
Apr  1 01:29:03 nebula4 lvm[6759]: LVM thread waiting for work
Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811138 for 0. len 31
Apr  1 01:29:16 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218eea0. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0
Apr  1 01:29:16 nebula4 lvm[6759]: process_work_item: remote
Apr  1 01:29:16 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 39602 on node 40a8e782
Apr  1 01:29:16 nebula4 lvm[6759]: Syncing device names
Apr  1 01:29:16 nebula4 lvm[6759]: LVM thread waiting for work
Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 1084811138. len 18
Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811138. len 18
Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811138. len 18
Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 0. len 31
Apr  1 01:29:16 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218eab0. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0
Apr  1 01:29:16 nebula4 lvm[6759]: process_work_item: remote
Apr  1 01:29:16 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 44354 on node 40a8e781
Apr  1 01:29:16 nebula4 lvm[6759]: Syncing device names
Apr  1 01:29:16 nebula4 lvm[6759]: LVM thread waiting for work
Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811137. len 18
Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811138 for 1084811137. len 18
Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811137. len 18
Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811138 for 0. len 31
Apr  1 01:29:16 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218eea0. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0
Apr  1 01:29:16 nebula4 lvm[6759]: process_work_item: remote
Apr  1 01:29:16 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 39605 on node 40a8e782
Apr  1 01:29:16 nebula4 lvm[6759]: Syncing device names
Apr  1 01:29:16 nebula4 lvm[6759]: LVM thread waiting for work
Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 1084811138. len 18
Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811138. len 18
Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811138. len 18
Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 0. len 31
Apr  1 01:29:16 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218eab0. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0
Apr  1 01:29:16 nebula4 lvm[6759]: process_work_item: remote
Apr  1 01:29:16 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 44357 on node 40a8e781
Apr  1 01:29:16 nebula4 lvm[6759]: Syncing device names
Apr  1 01:29:16 nebula4 lvm[6759]: LVM thread waiting for work
Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811137. len 18
Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811138 for 1084811137. len 18
Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811137. len 18
Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811138 for 0. len 31
Apr  1 01:29:16 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218eea0. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0
Apr  1 01:29:16 nebula4 lvm[6759]: process_work_item: remote
Apr  1 01:29:16 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 39608 on node 40a8e782
Apr  1 01:29:16 nebula4 lvm[6759]: Syncing device names
Apr  1 01:29:16 nebula4 lvm[6759]: LVM thread waiting for work
Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 1084811138. len 18
Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 0. len 31
Apr  1 01:29:16 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218eab0. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0
Apr  1 01:29:16 nebula4 lvm[6759]: process_work_item: remote
Apr  1 01:29:16 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 44360 on node 40a8e781
Apr  1 01:29:16 nebula4 lvm[6759]: Syncing device names
Apr  1 01:29:16 nebula4 lvm[6759]: LVM thread waiting for work
Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811138. len 18
Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811138. len 18
Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811137. len 18
Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811138 for 1084811137. len 18
Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811137. len 18
Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 0. len 31
Apr  1 01:29:16 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218eea0. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0
Apr  1 01:29:16 nebula4 lvm[6759]: process_work_item: remote
Apr  1 01:29:16 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 44363 on node 40a8e781
Apr  1 01:29:16 nebula4 lvm[6759]: Syncing device names
Apr  1 01:29:16 nebula4 lvm[6759]: LVM thread waiting for work
Apr  1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811137. len 18
Apr  1 01:29:23 nebula4 lvm[6759]: Got new connection on fd 5
Apr  1 01:29:23 nebula4 lvm[6759]: Read on local socket 5, len = 30
Apr  1 01:29:23 nebula4 lvm[6759]: creating pipe, [12, 13]
Apr  1 01:29:23 nebula4 lvm[6759]: Creating pre&post thread
Apr  1 01:29:23 nebula4 lvm[6759]: Created pre&post thread, state = 0
Apr  1 01:29:23 nebula4 lvm[6759]: in sub thread: client = 0x218eab0
Apr  1 01:29:23 nebula4 lvm[6759]: doing PRE command LOCK_VG 'V_vg-one-0' at 1 (client=0x218eab0)
Apr  1 01:29:23 nebula4 lvm[6759]: lock_resource 'V_vg-one-0', flags=0, mode=3
Apr  1 01:29:23 nebula4 lvm[6759]: lock_resource returning 0, lock_id=1
Apr  1 01:29:23 nebula4 lvm[6759]: Writing status 0 down pipe 13
Apr  1 01:29:23 nebula4 lvm[6759]: Waiting to do post command - state = 0
Apr  1 01:29:23 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0
Apr  1 01:29:23 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0
Apr  1 01:29:23 nebula4 lvm[6759]: distribute command: XID = 43975, flags=0x1 (LOCAL)
Apr  1 01:29:23 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218ed00. client=0x218eab0, msg=0x218ebc0, len=30, csid=(nil), xid=43975
Apr  1 01:29:23 nebula4 lvm[6759]: process_work_item: local
Apr  1 01:29:23 nebula4 lvm[6759]: process_local_command: LOCK_VG (0x33) msg=0x218ed40, msglen =30, client=0x218eab0
Apr  1 01:29:23 nebula4 lvm[6759]: do_lock_vg: resource 'V_vg-one-0', cmd = 0x1 LCK_VG (READ|VG), flags = 0x0 ( ), critical_section = 0
Apr  1 01:29:23 nebula4 lvm[6759]: Invalidating cached metadata for VG vg-one-0
Apr  1 01:29:23 nebula4 lvm[6759]: Reply from node 40a8e784: 0 bytes
Apr  1 01:29:23 nebula4 lvm[6759]: Got 1 replies, expecting: 1
Apr  1 01:29:23 nebula4 lvm[6759]: LVM thread waiting for work
Apr  1 01:29:23 nebula4 lvm[6759]: Got post command condition...
Apr  1 01:29:23 nebula4 lvm[6759]: Waiting for next pre command
Apr  1 01:29:23 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0
Apr  1 01:29:23 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0
Apr  1 01:29:23 nebula4 lvm[6759]: Send local reply
Apr  1 01:29:23 nebula4 lvm[6759]: Read on local socket 5, len = 31
Apr  1 01:29:23 nebula4 lvm[6759]: check_all_clvmds_running
Apr  1 01:29:23 nebula4 lvm[6759]: down_callback. node 1084811137, state = 3
Apr  1 01:29:23 nebula4 lvm[6759]: down_callback. node 1084811139, state = 3
Apr  1 01:29:23 nebula4 lvm[6759]: down_callback. node 1084811138, state = 3
Apr  1 01:29:23 nebula4 lvm[6759]: down_callback. node 1084811140, state = 3
Apr  1 01:29:23 nebula4 lvm[6759]: Got pre command condition...
Apr  1 01:29:23 nebula4 lvm[6759]: Writing status 0 down pipe 13
Apr  1 01:29:23 nebula4 lvm[6759]: Waiting to do post command - state = 0
Apr  1 01:29:23 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0
Apr  1 01:29:23 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0
Apr  1 01:29:23 nebula4 lvm[6759]: distribute command: XID = 43976, flags=0x0 ()
Apr  1 01:29:23 nebula4 lvm[6759]: num_nodes = 4
Apr  1 01:29:23 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218f100. client=0x218eab0, msg=0x218ebc0, len=31, csid=(nil), xid=43976
Apr  1 01:29:23 nebula4 lvm[6759]: Sending message to all cluster nodes
Apr  1 01:29:23 nebula4 lvm[6759]: process_work_item: local
Apr  1 01:29:23 nebula4 lvm[6759]: process_local_command: SYNC_NAMES (0x2d) msg=0x218ed00, msglen =31, client=0x218eab0
Apr  1 01:29:23 nebula4 lvm[6759]: Syncing device names
Apr  1 01:29:23 nebula4 lvm[6759]: Reply from node 40a8e784: 0 bytes
Apr  1 01:29:23 nebula4 lvm[6759]: Got 1 replies, expecting: 4
Apr  1 01:29:23 nebula4 lvm[6759]: LVM thread waiting for work
Apr  1 01:29:26 nebula4 corosync[6411]:   [TOTEM ] A processor failed, forming new configuration.
Apr  1 01:29:29 nebula4 corosync[6411]:   [TOTEM ] A new membership (192.168.231.129:1204) was formed. Members left: 1084811138
Apr  1 01:29:29 nebula4 lvm[6759]: confchg callback. 0 joined, 1 left, 3 members
Apr  1 01:29:29 nebula4 crmd[6490]:  warning: match_down_event: No match for shutdown action on 1084811138
Apr  1 01:29:29 nebula4 crmd[6490]:   notice: peer_update_callback: Stonith/shutdown of nebula2 not matched
Apr  1 01:29:29 nebula4 crmd[6490]:   notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Apr  1 01:29:29 nebula4 corosync[6411]:   [QUORUM] Members[3]: 1084811137 1084811139 1084811140
Apr  1 01:29:29 nebula4 corosync[6411]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr  1 01:29:29 nebula4 crmd[6490]:   notice: crm_update_peer_state: pcmk_quorum_notification: Node nebula2[1084811138] - state is now lost (was member)
Apr  1 01:29:29 nebula4 crmd[6490]:  warning: match_down_event: No match for shutdown action on 1084811138
Apr  1 01:29:29 nebula4 pacemakerd[6483]:   notice: crm_update_peer_state: pcmk_quorum_notification: Node nebula2[1084811138] - state is now lost (was member)
Apr  1 01:29:29 nebula4 crmd[6490]:   notice: peer_update_callback: Stonith/shutdown of nebula2 not matched
Apr  1 01:29:29 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 0. len 31
Apr  1 01:29:29 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811137. len 18
Apr  1 01:29:29 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 0. len 31
Apr  1 01:29:29 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218ed00. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0
Apr  1 01:29:29 nebula4 lvm[6759]: process_work_item: remote
Apr  1 01:29:29 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 43802 on node 40a8e783
Apr  1 01:29:29 nebula4 lvm[6759]: Syncing device names
Apr  1 01:29:29 nebula4 lvm[6759]: LVM thread waiting for work
Apr  1 01:29:29 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811140. len 18
Apr  1 01:29:29 nebula4 lvm[6759]: Reply from node 40a8e783: 0 bytes
Apr  1 01:29:29 nebula4 lvm[6759]: Got 2 replies, expecting: 4
Apr  1 01:29:29 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 1084811140. len 18
Apr  1 01:29:29 nebula4 lvm[6759]: Reply from node 40a8e781: 0 bytes
Apr  1 01:29:29 nebula4 lvm[6759]: Got 3 replies, expecting: 4
Apr  1 01:29:29 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 1084811139. len 18
Apr  1 01:29:29 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811139. len 18
Apr  1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence status 1084811138 receive 1 from 1084811137 walltime 1427844569 local 50759
Apr  1 01:29:29 nebula4 kernel: [50799.162381] dlm: closing connection to node 1084811138
Apr  1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence status 1084811138 receive 1 from 1084811139 walltime 1427844569 local 50759
Apr  1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence request 1084811138 pid 44527 nodedown time 1427844569 fence_all dlm_stonith
Apr  1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence result 1084811138 pid 44527 result 1 exit status
Apr  1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence status 1084811138 receive 1 from 1084811140 walltime 1427844569 local 50759
Apr  1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence request 1084811138 no actor
Apr  1 01:29:30 nebula4 pengine[6489]:  warning: pe_fence_node: Node nebula2 will be fenced because the node is no longer part of the cluster
Apr  1 01:29:30 nebula4 pengine[6489]:  warning: determine_online_status: Node nebula2 is unclean
Apr  1 01:29:30 nebula4 pengine[6489]:  warning: unpack_rsc_op: Processing failed op start for stonith-nebula4-IPMILAN on nebula3: unknown error (1)
Apr  1 01:29:30 nebula4 pengine[6489]:  warning: unpack_rsc_op: Processing failed op start for stonith-nebula4-IPMILAN on nebula1: unknown error (1)
Apr  1 01:29:30 nebula4 pengine[6489]:  warning: unpack_rsc_op: Processing failed op start for stonith-nebula4-IPMILAN on nebula2: unknown error (1)
Apr  1 01:29:30 nebula4 pengine[6489]:  warning: common_apply_stickiness: Forcing stonith-nebula4-IPMILAN away from nebula1 after 1000000 failures (max=1000000)
Apr  1 01:29:30 nebula4 pengine[6489]:  warning: common_apply_stickiness: Forcing stonith-nebula4-IPMILAN away from nebula2 after 1000000 failures (max=1000000)
Apr  1 01:29:30 nebula4 pengine[6489]:  warning: common_apply_stickiness: Forcing stonith-nebula4-IPMILAN away from nebula3 after 1000000 failures (max=1000000)
Apr  1 01:29:30 nebula4 pengine[6489]:  warning: custom_action: Action p_dlm:3_stop_0 on nebula2 is unrunnable (offline)
Apr  1 01:29:30 nebula4 pengine[6489]:  warning: custom_action: Action p_dlm:3_stop_0 on nebula2 is unrunnable (offline)
Apr  1 01:29:30 nebula4 pengine[6489]:  warning: custom_action: Action p_clvm:3_stop_0 on nebula2 is unrunnable (offline)
Apr  1 01:29:30 nebula4 pengine[6489]:  warning: custom_action: Action p_clvm:3_stop_0 on nebula2 is unrunnable (offline)
Apr  1 01:29:30 nebula4 pengine[6489]:  warning: custom_action: Action p_vg_one:3_stop_0 on nebula2 is unrunnable (offline)
Apr  1 01:29:30 nebula4 pengine[6489]:  warning: custom_action: Action p_vg_one:3_stop_0 on nebula2 is unrunnable (offline)
Apr  1 01:29:30 nebula4 pengine[6489]:  warning: custom_action: Action p_fs_one-datastores:3_stop_0 on nebula2 is unrunnable (offline)
Apr  1 01:29:30 nebula4 pengine[6489]:  warning: custom_action: Action p_fs_one-datastores:3_stop_0 on nebula2 is unrunnable (offline)
Apr  1 01:29:30 nebula4 pengine[6489]:  warning: custom_action: Action stonith-nebula1-IPMILAN_stop_0 on nebula2 is unrunnable (offline)
Apr  1 01:29:30 nebula4 pengine[6489]:  warning: stage6: Scheduling Node nebula2 for STONITH
Apr  1 01:29:30 nebula4 pengine[6489]:   notice: LogActions: Stop    p_dlm:3#011(nebula2)
Apr  1 01:29:30 nebula4 pengine[6489]:   notice: LogActions: Stop    p_clvm:3#011(nebula2)
Apr  1 01:29:30 nebula4 pengine[6489]:   notice: LogActions: Stop    p_vg_one:3#011(nebula2)
Apr  1 01:29:30 nebula4 pengine[6489]:   notice: LogActions: Stop    p_fs_one-datastores:3#011(nebula2)
Apr  1 01:29:30 nebula4 pengine[6489]:   notice: LogActions: Move    stonith-nebula1-IPMILAN#011(Started nebula2 -> nebula3)
Apr  1 01:29:30 nebula4 pengine[6489]:  warning: process_pe_message: Calculated Transition 101: /var/lib/pacemaker/pengine/pe-warn-22.bz2
Apr  1 01:29:30 nebula4 crmd[6490]:   notice: te_fence_node: Executing reboot fencing operation (98) on nebula2 (timeout=30000)
Apr  1 01:29:30 nebula4 stonith-ng[6486]:   notice: handle_request: Client crmd.6490.2707e557 wants to fence (reboot) 'nebula2' with device '(any)'
Apr  1 01:29:30 nebula4 stonith-ng[6486]:   notice: initiate_remote_stonith_op: Initiating remote operation reboot for nebula2: 39eaf3a2-d7e0-417d-8a01-d2f373973d6b (0)
Apr  1 01:29:30 nebula4 stonith-ng[6486]:   notice: can_fence_host_with_device: stonith-nebula1-IPMILAN can not fence nebula2: static-list
Apr  1 01:29:30 nebula4 stonith-ng[6486]:   notice: can_fence_host_with_device: stonith-nebula2-IPMILAN can fence nebula2: static-list
Apr  1 01:29:30 nebula4 stonith-ng[6486]:   notice: can_fence_host_with_device: stonith-one-frontend can not fence nebula2: static-list
Apr  1 01:29:30 nebula4 stonith-ng[6486]:   notice: can_fence_host_with_device: stonith-nebula3-IPMILAN can not fence nebula2: static-list
Apr  1 01:29:32 nebula4 stonith-ng[6486]:   notice: remote_op_done: Operation reboot of nebula2 by nebula3 for crmd.6490@nebula4.39eaf3a2: OK
Apr  1 01:29:32 nebula4 crmd[6490]:   notice: tengine_stonith_callback: Stonith operation 2/98:101:0:28913388-04df-49cb-9927-362b21a74014: OK (0)
Apr  1 01:29:32 nebula4 crmd[6490]:   notice: tengine_stonith_notify: Peer nebula2 was terminated (reboot) by nebula3 for nebula4: OK (ref=39eaf3a2-d7e0-417d-8a01-d2f373973d6b) by client crmd.6490
Apr  1 01:29:32 nebula4 crmd[6490]:   notice: te_rsc_command: Initiating action 91: start stonith-nebula1-IPMILAN_start_0 on nebula3
Apr  1 01:29:33 nebula4 crmd[6490]:   notice: run_graph: Transition 101 (Complete=13, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-22.bz2): Stopped
Apr  1 01:29:33 nebula4 pengine[6489]:  warning: unpack_rsc_op: Processing failed op start for stonith-nebula4-IPMILAN on nebula3: unknown error (1)
Apr  1 01:29:33 nebula4 pengine[6489]:  warning: unpack_rsc_op: Processing failed op start for stonith-nebula4-IPMILAN on nebula1: unknown error (1)
Apr  1 01:29:33 nebula4 pengine[6489]:  warning: common_apply_stickiness: Forcing stonith-nebula4-IPMILAN away from nebula1 after 1000000 failures (max=1000000)
Apr  1 01:29:33 nebula4 pengine[6489]:  warning: common_apply_stickiness: Forcing stonith-nebula4-IPMILAN away from nebula3 after 1000000 failures (max=1000000)
Apr  1 01:29:33 nebula4 pengine[6489]:   notice: process_pe_message: Calculated Transition 102: /var/lib/pacemaker/pengine/pe-input-129.bz2
Apr  1 01:29:33 nebula4 crmd[6490]:   notice: te_rsc_command: Initiating action 88: monitor stonith-nebula1-IPMILAN_monitor_3600000 on nebula3
Apr  1 01:29:34 nebula4 crmd[6490]:   notice: run_graph: Transition 102 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-129.bz2): Complete
Apr  1 01:29:34 nebula4 crmd[6490]:   notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Apr  1 01:30:01 nebula4 CRON[44640]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then munin-run apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then munin-run apt update 7200 12 >/dev/null; fi)
Apr  1 01:30:25 nebula4 dlm_controld[6737]: 50815 datastores wait for fencing
Apr  1 01:30:25 nebula4 dlm_controld[6737]: 50815 clvmd wait for fencing
Apr  1 01:30:29 nebula4 lvm[6759]: Request timed-out (send: 1427844563, now: 1427844629)
Apr  1 01:30:29 nebula4 lvm[6759]: Request timed-out. padding
Apr  1 01:30:29 nebula4 lvm[6759]: down_callback. node 1084811137, state = 3
Apr  1 01:30:29 nebula4 lvm[6759]: Checking for a reply from 40a8e781
Apr  1 01:30:29 nebula4 lvm[6759]: down_callback. node 1084811139, state = 3
Apr  1 01:30:29 nebula4 lvm[6759]: Checking for a reply from 40a8e783
Apr  1 01:30:29 nebula4 lvm[6759]: down_callback. node 1084811138, state = 1
Apr  1 01:30:29 nebula4 lvm[6759]: down_callback. node 1084811140, state = 3
Apr  1 01:30:29 nebula4 lvm[6759]: Checking for a reply from 40a8e784
Apr  1 01:30:29 nebula4 lvm[6759]: Got post command condition...
Apr  1 01:30:29 nebula4 lvm[6759]: Waiting for next pre command
Apr  1 01:30:29 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0
Apr  1 01:30:29 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0
Apr  1 01:30:29 nebula4 lvm[6759]: Send local reply
Apr  1 01:30:29 nebula4 lvm[6759]: Read on local socket 5, len = 30
Apr  1 01:30:29 nebula4 lvm[6759]: Got pre command condition...
Apr  1 01:30:29 nebula4 lvm[6759]: doing PRE command LOCK_VG 'V_vg-one-0' at 6 (client=0x218eab0)
Apr  1 01:30:29 nebula4 lvm[6759]: unlock_resource: V_vg-one-0 lockid: 1
Apr  1 01:40:01 nebula4 CRON[47640]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then munin-run apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then munin-run apt update 7200 12 >/dev/null; fi)
Apr  1 01:44:34 nebula4 crmd[6490]:   notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ]
Apr  1 01:44:34 nebula4 pengine[6489]:  warning: unpack_rsc_op: Processing failed op start for stonith-nebula4-IPMILAN on nebula3: unknown error (1)
Apr  1 01:44:34 nebula4 pengine[6489]:  warning: unpack_rsc_op: Processing failed op start for stonith-nebula4-IPMILAN on nebula1: unknown error (1)
Apr  1 01:44:34 nebula4 pengine[6489]:  warning: common_apply_stickiness: Forcing stonith-nebula4-IPMILAN away from nebula1 after 1000000 failures (max=1000000)
Apr  1 01:44:34 nebula4 pengine[6489]:  warning: common_apply_stickiness: Forcing stonith-nebula4-IPMILAN away from nebula3 after 1000000 failures (max=1000000)
Apr  1 01:44:34 nebula4 pengine[6489]:   notice: process_pe_message: Calculated Transition 103: /var/lib/pacemaker/pengine/pe-input-130.bz2
Apr  1 01:44:34 nebula4 crmd[6490]:   notice: run_graph: Transition 103 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-130.bz2): Complete
Apr  1 01:44:34 nebula4 crmd[6490]:   notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Apr  1 01:45:01 nebula4 CRON[49089]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then munin-run apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then munin-run apt update 7200 12 >/dev/null; fi)
Apr  1 01:46:01 nebula4 CRON[570]: (root) CMD (if test -x /usr/sbin/apticron; then /usr/sbin/apticron --cron; else true; fi)
Apr  1 01:49:20 nebula4 lvm[6759]: Got new connection on fd 17
Apr  1 01:49:20 nebula4 lvm[6759]: Read on local socket 17, len = 30
Apr  1 01:49:20 nebula4 lvm[6759]: creating pipe, [18, 19]
Apr  1 01:49:20 nebula4 lvm[6759]: Creating pre&post thread
Apr  1 01:49:20 nebula4 lvm[6759]: Created pre&post thread, state = 0
Apr  1 01:49:20 nebula4 lvm[6759]: in sub thread: client = 0x218f1f0
Apr  1 01:49:20 nebula4 lvm[6759]: doing PRE command LOCK_VG 'V_vg-one-0' at 1 (client=0x218f1f0)
Apr  1 01:49:20 nebula4 lvm[6759]: lock_resource 'V_vg-one-0', flags=0, mode=3
Attachment:
signature.asc

Description: PGP signature
_______________________________________________
Users mailing list: Users@xxxxxxxxxxxxxxx
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster