> On 1 Apr 2015, at 11:47 pm, Daniel Dehennin <daniel.dehennin@xxxxxxxxxxxx> wrote: > > Hello, > > On a 4 nodes OpenNebula cluster, running Ubuntu Trusty 14.04.2, with: > > - corosync 2.3.3-1ubuntu1 > - pacemaker 1.1.10+git20130802-1ubuntu2.3 > - dlm 4.0.1-0ubuntu1 > > Here is the node list with their IDs, to follow the logs: > > - 1084811137 nebula1 > - 1084811138 nebula2 > - 1084811139 nebula3 > - 1084811140 nebula4 (the actual DC) > > I have an issue where fencing is working but dlm always wait for > fencing, I needed to manually run “dlm_tool fence_ack 1084811138” this > morning, here are the logs: > > Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence status 1084811138 receive 1 from 1084811137 walltime 1427844569 local 50759 > Apr 1 01:29:29 nebula4 kernel: [50799.162381] dlm: closing connection to node 1084811138 > Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence status 1084811138 receive 1 from 1084811139 walltime 1427844569 local 50759 > Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence request 1084811138 pid 44527 nodedown time 1427844569 fence_all dlm_stonith > Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence result 1084811138 pid 44527 result 1 exit status > Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence status 1084811138 receive 1 from 1084811140 walltime 1427844569 local 50759 > Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence request 1084811138 no actor > [...] > Apr 1 01:30:25 nebula4 dlm_controld[6737]: 50815 datastores wait for fencing > Apr 1 01:30:25 nebula4 dlm_controld[6737]: 50815 clvmd wait for fencing > > > The stonith actually worked: > > Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: handle_request: Client crmd.6490.2707e557 wants to fence (reboot) 'nebula2' with device '(any)' > Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for nebula2: 39eaf3a2-d7e0-417d-8a01-d2f373973d6b (0) > Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: can_fence_host_with_device: stonith-nebula1-IPMILAN can not fence nebula2: static-list > Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: can_fence_host_with_device: stonith-nebula2-IPMILAN can fence nebula2: static-list > Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: can_fence_host_with_device: stonith-one-frontend can not fence nebula2: static-list > Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: can_fence_host_with_device: stonith-nebula3-IPMILAN can not fence nebula2: static-list > Apr 1 01:29:32 nebula4 stonith-ng[6486]: notice: remote_op_done: Operation reboot of nebula2 by nebula3 for crmd.6490@nebula4.39eaf3a2: OK > > I attache the logs of the DC nebula4 around from 01:29:03, where > everything worked fine (Got 4 replies, expecting: 4) to a little bit > after. > > To me, it looks like: > > - dlm ask for fencing directly at 01:29:29, the node was fenced since it > had garbage in its /var/log/syslog exactely at 01:29.29, plus its > uptime, but did not get a good response > > - pacemaker fence nebula2 at 01:29:30 because it's not part of the > cluster anymore (since 01:29:26 [TOTEM ] ... Members left: 1084811138) > This fencing works. > > Do you have any idea? There were two important fixes in this area since 1.1.10 + David Vossel (1 year, 1 month ago) 054fedf: Fix: stonith_api_time_helper now returns when the most recent fencing operation completed + Andrew Beekhof (1 year, 1 month ago) d9921e5: Fix: Fencing: Pass the correct options when looking up the history by node name Whether 'pacemaker 1.1.10+git20130802-1ubuntu2.3’ includes them is anybody’s guess. > > Regards. > -- > Daniel Dehennin > Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF > Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF > > Apr 1 01:29:03 nebula4 lvm[6759]: Waiting for next pre command > Apr 1 01:29:03 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0 > Apr 1 01:29:03 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0 > Apr 1 01:29:03 nebula4 lvm[6759]: Send local reply > Apr 1 01:29:03 nebula4 lvm[6759]: Read on local socket 5, len = 31 > Apr 1 01:29:03 nebula4 lvm[6759]: check_all_clvmds_running > Apr 1 01:29:03 nebula4 lvm[6759]: down_callback. node 1084811137, state = 3 > Apr 1 01:29:03 nebula4 lvm[6759]: down_callback. node 1084811139, state = 3 > Apr 1 01:29:03 nebula4 lvm[6759]: down_callback. node 1084811138, state = 3 > Apr 1 01:29:03 nebula4 lvm[6759]: down_callback. node 1084811140, state = 3 > Apr 1 01:29:03 nebula4 lvm[6759]: Got pre command condition... > Apr 1 01:29:03 nebula4 lvm[6759]: Writing status 0 down pipe 13 > Apr 1 01:29:03 nebula4 lvm[6759]: Waiting to do post command - state = 0 > Apr 1 01:29:03 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0 > Apr 1 01:29:03 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0 > Apr 1 01:29:03 nebula4 lvm[6759]: distribute command: XID = 43973, flags=0x0 () > Apr 1 01:29:03 nebula4 lvm[6759]: num_nodes = 4 > Apr 1 01:29:03 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218f100. client=0x218eab0, msg=0x218ebc0, len=31, csid=(nil), xid=43973 > Apr 1 01:29:03 nebula4 lvm[6759]: Sending message to all cluster nodes > Apr 1 01:29:03 nebula4 lvm[6759]: process_work_item: local > Apr 1 01:29:03 nebula4 lvm[6759]: process_local_command: SYNC_NAMES (0x2d) msg=0x218ed00, msglen =31, client=0x218eab0 > Apr 1 01:29:03 nebula4 lvm[6759]: Syncing device names > Apr 1 01:29:03 nebula4 lvm[6759]: Reply from node 40a8e784: 0 bytes > Apr 1 01:29:03 nebula4 lvm[6759]: Got 1 replies, expecting: 4 > Apr 1 01:29:03 nebula4 lvm[6759]: LVM thread waiting for work > Apr 1 01:29:03 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 0. len 31 > Apr 1 01:29:03 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811138 for 1084811140. len 18 > Apr 1 01:29:03 nebula4 lvm[6759]: Reply from node 40a8e782: 0 bytes > Apr 1 01:29:03 nebula4 lvm[6759]: Got 2 replies, expecting: 4 > Apr 1 01:29:03 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811140. len 18 > Apr 1 01:29:03 nebula4 lvm[6759]: Reply from node 40a8e783: 0 bytes > Apr 1 01:29:03 nebula4 lvm[6759]: Got 3 replies, expecting: 4 > Apr 1 01:29:03 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 1084811140. len 18 > Apr 1 01:29:03 nebula4 lvm[6759]: Reply from node 40a8e781: 0 bytes > Apr 1 01:29:03 nebula4 lvm[6759]: Got 4 replies, expecting: 4 > Apr 1 01:29:03 nebula4 lvm[6759]: Got post command condition... > Apr 1 01:29:03 nebula4 lvm[6759]: Waiting for next pre command > Apr 1 01:29:03 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0 > Apr 1 01:29:03 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0 > Apr 1 01:29:03 nebula4 lvm[6759]: Send local reply > Apr 1 01:29:03 nebula4 lvm[6759]: Read on local socket 5, len = 30 > Apr 1 01:29:03 nebula4 lvm[6759]: Got pre command condition... > Apr 1 01:29:03 nebula4 lvm[6759]: doing PRE command LOCK_VG 'V_vg-one-2' at 6 (client=0x218eab0) > Apr 1 01:29:03 nebula4 lvm[6759]: unlock_resource: V_vg-one-2 lockid: 1 > Apr 1 01:29:03 nebula4 lvm[6759]: Writing status 0 down pipe 13 > Apr 1 01:29:03 nebula4 lvm[6759]: Waiting to do post command - state = 0 > Apr 1 01:29:03 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0 > Apr 1 01:29:03 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0 > Apr 1 01:29:03 nebula4 lvm[6759]: distribute command: XID = 43974, flags=0x1 (LOCAL) > Apr 1 01:29:03 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218ed00. client=0x218eab0, msg=0x218ebc0, len=30, csid=(nil), xid=43974 > Apr 1 01:29:03 nebula4 lvm[6759]: process_work_item: local > Apr 1 01:29:03 nebula4 lvm[6759]: process_local_command: LOCK_VG (0x33) msg=0x218ed40, msglen =30, client=0x218eab0 > Apr 1 01:29:03 nebula4 lvm[6759]: do_lock_vg: resource 'V_vg-one-2', cmd = 0x6 LCK_VG (UNLOCK|VG), flags = 0x0 ( ), critical_section = 0 > Apr 1 01:29:03 nebula4 lvm[6759]: Invalidating cached metadata for VG vg-one-2 > Apr 1 01:29:03 nebula4 lvm[6759]: Reply from node 40a8e784: 0 bytes > Apr 1 01:29:03 nebula4 lvm[6759]: Got 1 replies, expecting: 1 > Apr 1 01:29:03 nebula4 lvm[6759]: LVM thread waiting for work > Apr 1 01:29:03 nebula4 lvm[6759]: Got post command condition... > Apr 1 01:29:03 nebula4 lvm[6759]: Waiting for next pre command > Apr 1 01:29:03 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0 > Apr 1 01:29:03 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0 > Apr 1 01:29:03 nebula4 lvm[6759]: Send local reply > Apr 1 01:29:03 nebula4 lvm[6759]: Read on local socket 5, len = 0 > Apr 1 01:29:03 nebula4 lvm[6759]: EOF on local socket: inprogress=0 > Apr 1 01:29:03 nebula4 lvm[6759]: Waiting for child thread > Apr 1 01:29:03 nebula4 lvm[6759]: Got pre command condition... > Apr 1 01:29:03 nebula4 lvm[6759]: Subthread finished > Apr 1 01:29:03 nebula4 lvm[6759]: Joined child thread > Apr 1 01:29:03 nebula4 lvm[6759]: ret == 0, errno = 0. removing client > Apr 1 01:29:03 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218ebc0. client=0x218eab0, msg=(nil), len=0, csid=(nil), xid=43974 > Apr 1 01:29:03 nebula4 lvm[6759]: process_work_item: free fd -1 > Apr 1 01:29:03 nebula4 lvm[6759]: LVM thread waiting for work > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811138 for 0. len 31 > Apr 1 01:29:16 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218eea0. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0 > Apr 1 01:29:16 nebula4 lvm[6759]: process_work_item: remote > Apr 1 01:29:16 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 39602 on node 40a8e782 > Apr 1 01:29:16 nebula4 lvm[6759]: Syncing device names > Apr 1 01:29:16 nebula4 lvm[6759]: LVM thread waiting for work > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 1084811138. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811138. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811138. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 0. len 31 > Apr 1 01:29:16 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218eab0. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0 > Apr 1 01:29:16 nebula4 lvm[6759]: process_work_item: remote > Apr 1 01:29:16 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 44354 on node 40a8e781 > Apr 1 01:29:16 nebula4 lvm[6759]: Syncing device names > Apr 1 01:29:16 nebula4 lvm[6759]: LVM thread waiting for work > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811137. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811138 for 1084811137. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811137. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811138 for 0. len 31 > Apr 1 01:29:16 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218eea0. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0 > Apr 1 01:29:16 nebula4 lvm[6759]: process_work_item: remote > Apr 1 01:29:16 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 39605 on node 40a8e782 > Apr 1 01:29:16 nebula4 lvm[6759]: Syncing device names > Apr 1 01:29:16 nebula4 lvm[6759]: LVM thread waiting for work > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 1084811138. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811138. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811138. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 0. len 31 > Apr 1 01:29:16 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218eab0. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0 > Apr 1 01:29:16 nebula4 lvm[6759]: process_work_item: remote > Apr 1 01:29:16 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 44357 on node 40a8e781 > Apr 1 01:29:16 nebula4 lvm[6759]: Syncing device names > Apr 1 01:29:16 nebula4 lvm[6759]: LVM thread waiting for work > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811137. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811138 for 1084811137. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811137. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811138 for 0. len 31 > Apr 1 01:29:16 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218eea0. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0 > Apr 1 01:29:16 nebula4 lvm[6759]: process_work_item: remote > Apr 1 01:29:16 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 39608 on node 40a8e782 > Apr 1 01:29:16 nebula4 lvm[6759]: Syncing device names > Apr 1 01:29:16 nebula4 lvm[6759]: LVM thread waiting for work > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 1084811138. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 0. len 31 > Apr 1 01:29:16 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218eab0. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0 > Apr 1 01:29:16 nebula4 lvm[6759]: process_work_item: remote > Apr 1 01:29:16 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 44360 on node 40a8e781 > Apr 1 01:29:16 nebula4 lvm[6759]: Syncing device names > Apr 1 01:29:16 nebula4 lvm[6759]: LVM thread waiting for work > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811138. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811138. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811137. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811138 for 1084811137. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811137. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 0. len 31 > Apr 1 01:29:16 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218eea0. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0 > Apr 1 01:29:16 nebula4 lvm[6759]: process_work_item: remote > Apr 1 01:29:16 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 44363 on node 40a8e781 > Apr 1 01:29:16 nebula4 lvm[6759]: Syncing device names > Apr 1 01:29:16 nebula4 lvm[6759]: LVM thread waiting for work > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811137. len 18 > Apr 1 01:29:23 nebula4 lvm[6759]: Got new connection on fd 5 > Apr 1 01:29:23 nebula4 lvm[6759]: Read on local socket 5, len = 30 > Apr 1 01:29:23 nebula4 lvm[6759]: creating pipe, [12, 13] > Apr 1 01:29:23 nebula4 lvm[6759]: Creating pre&post thread > Apr 1 01:29:23 nebula4 lvm[6759]: Created pre&post thread, state = 0 > Apr 1 01:29:23 nebula4 lvm[6759]: in sub thread: client = 0x218eab0 > Apr 1 01:29:23 nebula4 lvm[6759]: doing PRE command LOCK_VG 'V_vg-one-0' at 1 (client=0x218eab0) > Apr 1 01:29:23 nebula4 lvm[6759]: lock_resource 'V_vg-one-0', flags=0, mode=3 > Apr 1 01:29:23 nebula4 lvm[6759]: lock_resource returning 0, lock_id=1 > Apr 1 01:29:23 nebula4 lvm[6759]: Writing status 0 down pipe 13 > Apr 1 01:29:23 nebula4 lvm[6759]: Waiting to do post command - state = 0 > Apr 1 01:29:23 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0 > Apr 1 01:29:23 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0 > Apr 1 01:29:23 nebula4 lvm[6759]: distribute command: XID = 43975, flags=0x1 (LOCAL) > Apr 1 01:29:23 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218ed00. client=0x218eab0, msg=0x218ebc0, len=30, csid=(nil), xid=43975 > Apr 1 01:29:23 nebula4 lvm[6759]: process_work_item: local > Apr 1 01:29:23 nebula4 lvm[6759]: process_local_command: LOCK_VG (0x33) msg=0x218ed40, msglen =30, client=0x218eab0 > Apr 1 01:29:23 nebula4 lvm[6759]: do_lock_vg: resource 'V_vg-one-0', cmd = 0x1 LCK_VG (READ|VG), flags = 0x0 ( ), critical_section = 0 > Apr 1 01:29:23 nebula4 lvm[6759]: Invalidating cached metadata for VG vg-one-0 > Apr 1 01:29:23 nebula4 lvm[6759]: Reply from node 40a8e784: 0 bytes > Apr 1 01:29:23 nebula4 lvm[6759]: Got 1 replies, expecting: 1 > Apr 1 01:29:23 nebula4 lvm[6759]: LVM thread waiting for work > Apr 1 01:29:23 nebula4 lvm[6759]: Got post command condition... > Apr 1 01:29:23 nebula4 lvm[6759]: Waiting for next pre command > Apr 1 01:29:23 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0 > Apr 1 01:29:23 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0 > Apr 1 01:29:23 nebula4 lvm[6759]: Send local reply > Apr 1 01:29:23 nebula4 lvm[6759]: Read on local socket 5, len = 31 > Apr 1 01:29:23 nebula4 lvm[6759]: check_all_clvmds_running > Apr 1 01:29:23 nebula4 lvm[6759]: down_callback. node 1084811137, state = 3 > Apr 1 01:29:23 nebula4 lvm[6759]: down_callback. node 1084811139, state = 3 > Apr 1 01:29:23 nebula4 lvm[6759]: down_callback. node 1084811138, state = 3 > Apr 1 01:29:23 nebula4 lvm[6759]: down_callback. node 1084811140, state = 3 > Apr 1 01:29:23 nebula4 lvm[6759]: Got pre command condition... > Apr 1 01:29:23 nebula4 lvm[6759]: Writing status 0 down pipe 13 > Apr 1 01:29:23 nebula4 lvm[6759]: Waiting to do post command - state = 0 > Apr 1 01:29:23 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0 > Apr 1 01:29:23 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0 > Apr 1 01:29:23 nebula4 lvm[6759]: distribute command: XID = 43976, flags=0x0 () > Apr 1 01:29:23 nebula4 lvm[6759]: num_nodes = 4 > Apr 1 01:29:23 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218f100. client=0x218eab0, msg=0x218ebc0, len=31, csid=(nil), xid=43976 > Apr 1 01:29:23 nebula4 lvm[6759]: Sending message to all cluster nodes > Apr 1 01:29:23 nebula4 lvm[6759]: process_work_item: local > Apr 1 01:29:23 nebula4 lvm[6759]: process_local_command: SYNC_NAMES (0x2d) msg=0x218ed00, msglen =31, client=0x218eab0 > Apr 1 01:29:23 nebula4 lvm[6759]: Syncing device names > Apr 1 01:29:23 nebula4 lvm[6759]: Reply from node 40a8e784: 0 bytes > Apr 1 01:29:23 nebula4 lvm[6759]: Got 1 replies, expecting: 4 > Apr 1 01:29:23 nebula4 lvm[6759]: LVM thread waiting for work > Apr 1 01:29:26 nebula4 corosync[6411]: [TOTEM ] A processor failed, forming new configuration. > Apr 1 01:29:29 nebula4 corosync[6411]: [TOTEM ] A new membership (192.168.231.129:1204) was formed. Members left: 1084811138 > Apr 1 01:29:29 nebula4 lvm[6759]: confchg callback. 0 joined, 1 left, 3 members > Apr 1 01:29:29 nebula4 crmd[6490]: warning: match_down_event: No match for shutdown action on 1084811138 > Apr 1 01:29:29 nebula4 crmd[6490]: notice: peer_update_callback: Stonith/shutdown of nebula2 not matched > Apr 1 01:29:29 nebula4 crmd[6490]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] > Apr 1 01:29:29 nebula4 corosync[6411]: [QUORUM] Members[3]: 1084811137 1084811139 1084811140 > Apr 1 01:29:29 nebula4 corosync[6411]: [MAIN ] Completed service synchronization, ready to provide service. > Apr 1 01:29:29 nebula4 crmd[6490]: notice: crm_update_peer_state: pcmk_quorum_notification: Node nebula2[1084811138] - state is now lost (was member) > Apr 1 01:29:29 nebula4 crmd[6490]: warning: match_down_event: No match for shutdown action on 1084811138 > Apr 1 01:29:29 nebula4 pacemakerd[6483]: notice: crm_update_peer_state: pcmk_quorum_notification: Node nebula2[1084811138] - state is now lost (was member) > Apr 1 01:29:29 nebula4 crmd[6490]: notice: peer_update_callback: Stonith/shutdown of nebula2 not matched > Apr 1 01:29:29 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 0. len 31 > Apr 1 01:29:29 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811137. len 18 > Apr 1 01:29:29 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 0. len 31 > Apr 1 01:29:29 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218ed00. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0 > Apr 1 01:29:29 nebula4 lvm[6759]: process_work_item: remote > Apr 1 01:29:29 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 43802 on node 40a8e783 > Apr 1 01:29:29 nebula4 lvm[6759]: Syncing device names > Apr 1 01:29:29 nebula4 lvm[6759]: LVM thread waiting for work > Apr 1 01:29:29 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811140. len 18 > Apr 1 01:29:29 nebula4 lvm[6759]: Reply from node 40a8e783: 0 bytes > Apr 1 01:29:29 nebula4 lvm[6759]: Got 2 replies, expecting: 4 > Apr 1 01:29:29 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 1084811140. len 18 > Apr 1 01:29:29 nebula4 lvm[6759]: Reply from node 40a8e781: 0 bytes > Apr 1 01:29:29 nebula4 lvm[6759]: Got 3 replies, expecting: 4 > Apr 1 01:29:29 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 1084811139. len 18 > Apr 1 01:29:29 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811139. len 18 > Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence status 1084811138 receive 1 from 1084811137 walltime 1427844569 local 50759 > Apr 1 01:29:29 nebula4 kernel: [50799.162381] dlm: closing connection to node 1084811138 > Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence status 1084811138 receive 1 from 1084811139 walltime 1427844569 local 50759 > Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence request 1084811138 pid 44527 nodedown time 1427844569 fence_all dlm_stonith > Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence result 1084811138 pid 44527 result 1 exit status > Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence status 1084811138 receive 1 from 1084811140 walltime 1427844569 local 50759 > Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence request 1084811138 no actor > Apr 1 01:29:30 nebula4 pengine[6489]: warning: pe_fence_node: Node nebula2 will be fenced because the node is no longer part of the cluster > Apr 1 01:29:30 nebula4 pengine[6489]: warning: determine_online_status: Node nebula2 is unclean > Apr 1 01:29:30 nebula4 pengine[6489]: warning: unpack_rsc_op: Processing failed op start for stonith-nebula4-IPMILAN on nebula3: unknown error (1) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: unpack_rsc_op: Processing failed op start for stonith-nebula4-IPMILAN on nebula1: unknown error (1) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: unpack_rsc_op: Processing failed op start for stonith-nebula4-IPMILAN on nebula2: unknown error (1) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: common_apply_stickiness: Forcing stonith-nebula4-IPMILAN away from nebula1 after 1000000 failures (max=1000000) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: common_apply_stickiness: Forcing stonith-nebula4-IPMILAN away from nebula2 after 1000000 failures (max=1000000) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: common_apply_stickiness: Forcing stonith-nebula4-IPMILAN away from nebula3 after 1000000 failures (max=1000000) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: custom_action: Action p_dlm:3_stop_0 on nebula2 is unrunnable (offline) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: custom_action: Action p_dlm:3_stop_0 on nebula2 is unrunnable (offline) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: custom_action: Action p_clvm:3_stop_0 on nebula2 is unrunnable (offline) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: custom_action: Action p_clvm:3_stop_0 on nebula2 is unrunnable (offline) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: custom_action: Action p_vg_one:3_stop_0 on nebula2 is unrunnable (offline) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: custom_action: Action p_vg_one:3_stop_0 on nebula2 is unrunnable (offline) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: custom_action: Action p_fs_one-datastores:3_stop_0 on nebula2 is unrunnable (offline) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: custom_action: Action p_fs_one-datastores:3_stop_0 on nebula2 is unrunnable (offline) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: custom_action: Action stonith-nebula1-IPMILAN_stop_0 on nebula2 is unrunnable (offline) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: stage6: Scheduling Node nebula2 for STONITH > Apr 1 01:29:30 nebula4 pengine[6489]: notice: LogActions: Stop p_dlm:3#011(nebula2) > Apr 1 01:29:30 nebula4 pengine[6489]: notice: LogActions: Stop p_clvm:3#011(nebula2) > Apr 1 01:29:30 nebula4 pengine[6489]: notice: LogActions: Stop p_vg_one:3#011(nebula2) > Apr 1 01:29:30 nebula4 pengine[6489]: notice: LogActions: Stop p_fs_one-datastores:3#011(nebula2) > Apr 1 01:29:30 nebula4 pengine[6489]: notice: LogActions: Move stonith-nebula1-IPMILAN#011(Started nebula2 -> nebula3) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: process_pe_message: Calculated Transition 101: /var/lib/pacemaker/pengine/pe-warn-22.bz2 > Apr 1 01:29:30 nebula4 crmd[6490]: notice: te_fence_node: Executing reboot fencing operation (98) on nebula2 (timeout=30000) > Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: handle_request: Client crmd.6490.2707e557 wants to fence (reboot) 'nebula2' with device '(any)' > Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for nebula2: 39eaf3a2-d7e0-417d-8a01-d2f373973d6b (0) > Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: can_fence_host_with_device: stonith-nebula1-IPMILAN can not fence nebula2: static-list > Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: can_fence_host_with_device: stonith-nebula2-IPMILAN can fence nebula2: static-list > Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: can_fence_host_with_device: stonith-one-frontend can not fence nebula2: static-list > Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: can_fence_host_with_device: stonith-nebula3-IPMILAN can not fence nebula2: static-list > Apr 1 01:29:32 nebula4 stonith-ng[6486]: notice: remote_op_done: Operation reboot of nebula2 by nebula3 for crmd.6490@nebula4.39eaf3a2: OK > Apr 1 01:29:32 nebula4 crmd[6490]: notice: tengine_stonith_callback: Stonith operation 2/98:101:0:28913388-04df-49cb-9927-362b21a74014: OK (0) > Apr 1 01:29:32 nebula4 crmd[6490]: notice: tengine_stonith_notify: Peer nebula2 was terminated (reboot) by nebula3 for nebula4: OK (ref=39eaf3a2-d7e0-417d-8a01-d2f373973d6b) by client crmd.6490 > Apr 1 01:29:32 nebula4 crmd[6490]: notice: te_rsc_command: Initiating action 91: start stonith-nebula1-IPMILAN_start_0 on nebula3 > Apr 1 01:29:33 nebula4 crmd[6490]: notice: run_graph: Transition 101 (Complete=13, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-22.bz2): Stopped > Apr 1 01:29:33 nebula4 pengine[6489]: warning: unpack_rsc_op: Processing failed op start for stonith-nebula4-IPMILAN on nebula3: unknown error (1) > Apr 1 01:29:33 nebula4 pengine[6489]: warning: unpack_rsc_op: Processing failed op start for stonith-nebula4-IPMILAN on nebula1: unknown error (1) > Apr 1 01:29:33 nebula4 pengine[6489]: warning: common_apply_stickiness: Forcing stonith-nebula4-IPMILAN away from nebula1 after 1000000 failures (max=1000000) > Apr 1 01:29:33 nebula4 pengine[6489]: warning: common_apply_stickiness: Forcing stonith-nebula4-IPMILAN away from nebula3 after 1000000 failures (max=1000000) > Apr 1 01:29:33 nebula4 pengine[6489]: notice: process_pe_message: Calculated Transition 102: /var/lib/pacemaker/pengine/pe-input-129.bz2 > Apr 1 01:29:33 nebula4 crmd[6490]: notice: te_rsc_command: Initiating action 88: monitor stonith-nebula1-IPMILAN_monitor_3600000 on nebula3 > Apr 1 01:29:34 nebula4 crmd[6490]: notice: run_graph: Transition 102 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-129.bz2): Complete > Apr 1 01:29:34 nebula4 crmd[6490]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] > Apr 1 01:30:01 nebula4 CRON[44640]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then munin-run apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then munin-run apt update 7200 12 >/dev/null; fi) > Apr 1 01:30:25 nebula4 dlm_controld[6737]: 50815 datastores wait for fencing > Apr 1 01:30:25 nebula4 dlm_controld[6737]: 50815 clvmd wait for fencing > Apr 1 01:30:29 nebula4 lvm[6759]: Request timed-out (send: 1427844563, now: 1427844629) > Apr 1 01:30:29 nebula4 lvm[6759]: Request timed-out. padding > Apr 1 01:30:29 nebula4 lvm[6759]: down_callback. node 1084811137, state = 3 > Apr 1 01:30:29 nebula4 lvm[6759]: Checking for a reply from 40a8e781 > Apr 1 01:30:29 nebula4 lvm[6759]: down_callback. node 1084811139, state = 3 > Apr 1 01:30:29 nebula4 lvm[6759]: Checking for a reply from 40a8e783 > Apr 1 01:30:29 nebula4 lvm[6759]: down_callback. node 1084811138, state = 1 > Apr 1 01:30:29 nebula4 lvm[6759]: down_callback. node 1084811140, state = 3 > Apr 1 01:30:29 nebula4 lvm[6759]: Checking for a reply from 40a8e784 > Apr 1 01:30:29 nebula4 lvm[6759]: Got post command condition... > Apr 1 01:30:29 nebula4 lvm[6759]: Waiting for next pre command > Apr 1 01:30:29 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0 > Apr 1 01:30:29 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0 > Apr 1 01:30:29 nebula4 lvm[6759]: Send local reply > Apr 1 01:30:29 nebula4 lvm[6759]: Read on local socket 5, len = 30 > Apr 1 01:30:29 nebula4 lvm[6759]: Got pre command condition... > Apr 1 01:30:29 nebula4 lvm[6759]: doing PRE command LOCK_VG 'V_vg-one-0' at 6 (client=0x218eab0) > Apr 1 01:30:29 nebula4 lvm[6759]: unlock_resource: V_vg-one-0 lockid: 1 > Apr 1 01:40:01 nebula4 CRON[47640]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then munin-run apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then munin-run apt update 7200 12 >/dev/null; fi) > Apr 1 01:44:34 nebula4 crmd[6490]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ] > Apr 1 01:44:34 nebula4 pengine[6489]: warning: unpack_rsc_op: Processing failed op start for stonith-nebula4-IPMILAN on nebula3: unknown error (1) > Apr 1 01:44:34 nebula4 pengine[6489]: warning: unpack_rsc_op: Processing failed op start for stonith-nebula4-IPMILAN on nebula1: unknown error (1) > Apr 1 01:44:34 nebula4 pengine[6489]: warning: common_apply_stickiness: Forcing stonith-nebula4-IPMILAN away from nebula1 after 1000000 failures (max=1000000) > Apr 1 01:44:34 nebula4 pengine[6489]: warning: common_apply_stickiness: Forcing stonith-nebula4-IPMILAN away from nebula3 after 1000000 failures (max=1000000) > Apr 1 01:44:34 nebula4 pengine[6489]: notice: process_pe_message: Calculated Transition 103: /var/lib/pacemaker/pengine/pe-input-130.bz2 > Apr 1 01:44:34 nebula4 crmd[6490]: notice: run_graph: Transition 103 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-130.bz2): Complete > Apr 1 01:44:34 nebula4 crmd[6490]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] > Apr 1 01:45:01 nebula4 CRON[49089]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then munin-run apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then munin-run apt update 7200 12 >/dev/null; fi) > Apr 1 01:46:01 nebula4 CRON[570]: (root) CMD (if test -x /usr/sbin/apticron; then /usr/sbin/apticron --cron; else true; fi) > Apr 1 01:49:20 nebula4 lvm[6759]: Got new connection on fd 17 > Apr 1 01:49:20 nebula4 lvm[6759]: Read on local socket 17, len = 30 > Apr 1 01:49:20 nebula4 lvm[6759]: creating pipe, [18, 19] > Apr 1 01:49:20 nebula4 lvm[6759]: Creating pre&post thread > Apr 1 01:49:20 nebula4 lvm[6759]: Created pre&post thread, state = 0 > Apr 1 01:49:20 nebula4 lvm[6759]: in sub thread: client = 0x218f1f0 > Apr 1 01:49:20 nebula4 lvm[6759]: doing PRE command LOCK_VG 'V_vg-one-0' at 1 (client=0x218f1f0) > Apr 1 01:49:20 nebula4 lvm[6759]: lock_resource 'V_vg-one-0', flags=0, mode=3 > _______________________________________________ > Users mailing list: Users@xxxxxxxxxxxxxxx > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster