Re: nfs over rbd problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've run into many problems trying to run RBD/NFS Pacemaker like you describe on a two node cluster. In my case, most of the problems were a result of a) no quorum and b) no STONITH. If you're going to be running this setup in production, I *highly* recommend adding more nodes (if only to maintain quorum). Without quorum, you will run into split-brain situations where you'll see anything from both nodes starting the same services to configuration changes disappearing when the nodes can't agree on a recent revision.

In this case, it looks like a STONITH problem. Without a fencing mechanism, node2 cannot be 100% certain that node1 is dead. When you manually shutdown corosync, I suspect that as part of the process, node1 alerts the cluster that it's leaving as part of a planned shutdown. When it just disappears, node2 can't determine what happened, it just knows node1 isn't answering anymore. Node1 is not necessarily down, there could just be a network problem between node1 and node2. In such a scenario, it would be very bad for node2 to map/mount the RBDs and start writing data while node1 is still providing the same service. Quorum can help here too, but for node2 to be certain node1 is down, it needs an out-of-band method to force power off node1, eg. I use IPMI on a separate physical network.

So...add mode nodes as quorum members. You can set weight restrictions on the resource groups to prevent them from running on these nodes if the are not as powerful as the nodes you're using now. Then, add a STONITH mechanism on all the nodes, and verify it works.

Once you do that, you should see things act the way you expect. Good luck!

-Steve

On 12/19/2015 03:46 AM, maoqi1982 wrote:
Hi list
I have a test ceph cluster include 3 nodes (node0: mon, node1: osd and nfs server1, node2 osd and  nfs server2).
os :centos6.6 ,kernel :3.10.94-1.el6.elrepo.x86_64, ceph version 0.94.5
I followed the  http://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/ instructions to setup an active/standy  NFS environment.
when using commands " #service corosync stop or # poweroff " on node1 ,  the fail over switch situation went fine ( nfs server take over by node2). But when I testing the situation of cutting off the power of node1, the switch is failed.
1. [root@node1 ~]# crm status
Last updated: Fri Dec 18 17:14:19 2015
Last change: Fri Dec 18 17:13:29 2015
Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 3 expected votes
8 Resources configured
Online: [ node1 node2 ]
 Resource Group: g_rbd_share_1
     p_rbd_map_1        (ocf::ceph:rbd.in):     Started node1
     p_fs_rbd_1 (ocf::heartbeat:Filesystem):    Started node1
     p_export_rbd_1     (ocf::heartbeat:exportfs):      Started node1
     p_vip_1    (ocf::heartbeat:IPaddr):        Started node1
 Clone Set: clo_nfs [g_nfs]
     Started: [ node1 node2 ]
2. [root@node1 ~]# service corosync stop
[root@node2 cluster]# crm status
Last updated: Fri Dec 18 17:14:59 2015
Last change: Fri Dec 18 17:13:29 2015
Stack: classic openais (with plugin)
Current DC: node2 - partition WITHOUT quorum
Version: 1.1.11-97629de
2 Nodes configured, 3 expected votes
8 Resources configured
Online: [ node2 ]
OFFLINE: [ node1 ]
 Resource Group: g_rbd_share_1
     p_rbd_map_1        (ocf::ceph:rbd.in):     Started node2
     p_fs_rbd_1 (ocf::heartbeat:Filesystem):    Started node2
     p_export_rbd_1     (ocf::heartbeat:exportfs):      Started node2
     p_vip_1    (ocf::heartbeat:IPaddr):        Started node2
 Clone Set: clo_nfs [g_nfs]
     Started: [ node2 ]
     Stopped: [ node1 ]

3. cut off node1 power manually
[root@node2 cluster]# crm status
Last updated: Fri Dec 18 17:23:06 2015
Last change: Fri Dec 18 17:13:29 2015
Stack: classic openais (with plugin)
Current DC: node2 - partition WITHOUT quorum
Version: 1.1.11-97629de
2 Nodes configured, 3 expected votes
8 Resources configured
Online: [ node2 ]
OFFLINE: [ node1 ]
 Clone Set: clo_nfs [g_nfs]
     Started: [ node2 ]
     Stopped: [ node1 ]
Failed actions:
    p_rbd_map_1_start_0 on node2 'unknown error' (1): call=48, status=Timed Out, last-rc-change='Fri Dec 18 17:22:19 2015', queued=0ms, exec=20002ms
corosync.log:
Dec 18 17:22:19 corosync [pcmk  ] notice: pcmk_peer_update: Transitional membership event on ring 668: memb=1, new=0, lost=1
Dec 18 17:22:19 corosync [pcmk  ] info: pcmk_peer_update: memb: node2 1211279552
Dec 18 17:22:19 corosync [pcmk  ] info: pcmk_peer_update: lost: node1 1194502336
Dec 18 17:22:19 corosync [pcmk  ] notice: pcmk_peer_update: Stable membership event on ring 668: memb=1, new=0, lost=0
Dec 18 17:22:19 corosync [pcmk  ] info: pcmk_peer_update: MEMB: node2 1211279552
Dec 18 17:22:19 corosync [pcmk  ] info: ais_mark_unseen_peer_dead: Node node1 was not seen in the previous transition
Dec 18 17:22:19 corosync [pcmk  ] info: update_member: Node 1194502336/node1 is now: lost
Dec 18 17:22:19 [2695] node2       crmd:   notice: plugin_handle_membership:     Membership 668: quorum lost
Dec 18 17:22:19 [2690] node2        cib:   notice: plugin_handle_membership:     Membership 668: quorum lost
Dec 18 17:22:19 corosync [pcmk  ] info: send_member_notification: Sending membership update 668 to 3 children
Dec 18 17:22:19 [2691] node2 stonith-ng:   notice: plugin_handle_membership:     Membership 668: quorum lost
Dec 18 17:22:19 [2695] node2       crmd:     info: crm_update_peer_proc:     plugin_handle_membership: Node node1[1194502336] - unknown is now lost
Dec 18 17:22:19 [2690] node2        cib:     info: crm_update_peer_proc:     plugin_handle_membership: Node node1[1194502336] - unknown is now lost
Dec 18 17:22:19 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Dec 18 17:22:19 [2695] node2       crmd:     info: peer_update_callback:     Client node1/peer now has status [offline] (DC=true, changed=111312)
Dec 18 17:22:19 [2691] node2 stonith-ng:     info: crm_update_peer_proc:     plugin_handle_membership: Node node1[1194502336] - unknown is now lost
Dec 18 17:22:19 [2695] node2       crmd:     info: peer_update_callback:     Peer node1 left us
Dec 18 17:22:19 [2690] node2        cib:   notice: crm_update_peer_state:     plugin_handle_membership: Node node1[1194502336] - state is now lost (was member)
Dec 18 17:22:19 [2695] node2       crmd:  warning: match_down_event:     No match for shutdown action on node1
Dec 18 17:22:19 [2691] node2 stonith-ng:   notice: crm_update_peer_state:     plugin_handle_membership: Node node1[1194502336] - state is now lost (was member)
Dec 18 17:22:19 [2695] node2       crmd:   notice: peer_update_callback:     Stonith/shutdown of node1 not matched
Dec 18 17:22:19 [2695] node2       crmd:     info: crm_update_peer_join:     peer_update_callback: Node node1[1194502336] - join-2 phase 4 -> 0
Dec 18 17:22:19 [2695] node2       crmd:     info: abort_transition_graph:     Transition aborted: Node failure (source=peer_update_callback:237, 1)
Dec 18 17:22:19 [2695] node2       crmd:   notice: crm_update_peer_state:     plugin_handle_membership: Node node1[1194502336] - state is now lost (was member)
Dec 18 17:22:19 [2695] node2       crmd:     info: peer_update_callback:     node1 is now lost (was member)
Dec 18 17:22:19 [2695] node2       crmd:  warning: match_down_event:     No match for shutdown action on node1
Dec 18 17:22:19 corosync [CPG   ] chosen downlist: sender r(0) ip(192.168.50.72) ; members(old:2 left:1)
Dec 18 17:22:19 [2690] node2        cib:     info: cib_process_request:     Forwarding cib_modify operation for section status to master (origin=local/crmd/88)
Dec 18 17:22:19 [2695] node2       crmd:   notice: peer_update_callback:     Stonith/shutdown of node1 not matched
Dec 18 17:22:19 [2695] node2       crmd:     info: abort_transition_graph:     Transition aborted: Node failure (source=peer_update_callback:237, 1)
Dec 18 17:22:19 corosync [MAIN  ] Completed service synchronization, ready to provide service.
Dec 18 17:22:19 [2690] node2        cib:     info: cib_process_request:     Forwarding cib_modify operation for section status to master (origin=local/crmd/89)
Dec 18 17:22:19 [2690] node2        cib:     info: cib_perform_op:     Diff: --- 0.69.157 2
Dec 18 17:22:19 [2690] node2        cib:     info: cib_perform_op:     Diff: +++ 0.69.158 (null)
Dec 18 17:22:19 [2690] node2        cib:     info: cib_perform_op:     +  /cib:  @num_updates=158
Dec 18 17:22:19 [2690] node2        cib:     info: cib_perform_op:     +  /cib/status/node_state[@id='node1']:  @crmd=offline, @crm-debug-origin=peer_update_callback
Dec 18 17:22:19 [2690] node2        cib:     info: cib_process_request:     Completed cib_modify operation for section status: OK (rc=0, origin=node2/crmd/88, version=0.69.158)
Dec 18 17:22:19 [2690] node2        cib:     info: cib_process_request:     Completed cib_modify operation for section status: OK (rc=0, origin=node2/crmd/89, version=0.69.158)
Dec 18 17:22:19 [2690] node2        cib:     info: cib_process_request:     Forwarding cib_modify operation for section nodes to master (origin=local/crmd/90)
Dec 18 17:22:19 [2690] node2        cib:     info: cib_process_request:     Forwarding cib_modify operation for section status to master (origin=local/crmd/91)
Dec 18 17:22:19 [2690] node2        cib:     info: cib_process_request:     Forwarding cib_modify operation for section cib to master (origin=local/crmd/92)
Dec 18 17:22:19 [2695] node2       crmd:     info: crmd_cs_dispatch:     Setting expected votes to 3
Dec 18 17:22:19 [2695] node2       crmd:   notice: do_state_transition:     State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=check_join_state ]
Dec 18 17:22:19 [2695] node2       crmd:     info: do_dc_join_offer_one:     An unknown node joined - (re-)offer to any unconfirmed nodes
Dec 18 17:22:19 [2695] node2       crmd:     info: join_make_offer:     Making join offers based on membership 668
Dec 18 17:22:19 [2695] node2       crmd:     info: join_make_offer:     Skipping node2: already known 4
Dec 18 17:22:19 [2695] node2       crmd:     info: join_make_offer:     Not making an offer to node1: not active (lost)
Dec 18 17:22:19 [2695] node2       crmd:     info: abort_transition_graph:     Transition aborted: Peer Halt (source=do_te_invoke:158, 1)
Dec 18 17:22:19 [2695] node2       crmd:     info: do_state_transition:     State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]
Dec 18 17:22:19 [2695] node2       crmd:     info: crmd_join_phase_log:     join-2: node2=confirmed
Dec 18 17:22:19 [2695] node2       crmd:     info: crmd_join_phase_log:     join-2: node1=none
Dec 18 17:22:19 [2695] node2       crmd:     info: do_state_transition:     State transition S_FINALIZE_JOIN -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=peer_update_callback ]
Dec 18 17:22:19 [2695] node2       crmd:     info: do_dc_join_offer_one:     An unknown node joined - (re-)offer to any unconfirmed nodes
Dec 18 17:22:19 [2695] node2       crmd:     info: join_make_offer:     Skipping node2: already known 4
Dec 18 17:22:19 [2695] node2       crmd:     info: join_make_offer:     Not making an offer to node1: not active (lost)
Dec 18 17:22:19 [2695] node2       crmd:     info: do_state_transition:     State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]
Dec 18 17:22:19 [2695] node2       crmd:     info: crmd_join_phase_log:     join-2: node2=confirmed
Dec 18 17:22:19 [2695] node2       crmd:     info: crmd_join_phase_log:     join-2: node1=none
Dec 18 17:22:19 [2695] node2       crmd:     info: do_state_transition:     State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ]
Dec 18 17:22:19 [2695] node2       crmd:     info: abort_transition_graph:     Transition aborted: Peer Cancelled (source=do_te_invoke:151, 1)
Dec 18 17:22:19 [2693] node2      attrd:   notice: attrd_local_callback:     Sending full refresh (origin=crmd)
Dec 18 17:22:19 [2693] node2      attrd:   notice: attrd_trigger_update:     Sending flush op to all hosts for: shutdown (0)
Dec 18 17:22:19 [2690] node2        cib:     info: cib_process_request:     Completed cib_modify operation for section nodes: OK (rc=0, origin=node2/crmd/90, version=0.69.158)
Dec 18 17:22:19 [2690] node2        cib:     info: cib_perform_op:     Diff: --- 0.69.158 2
Dec 18 17:22:19 [2690] node2        cib:     info: cib_perform_op:     Diff: +++ 0.69.159 (null)
Dec 18 17:22:19 [2690] node2        cib:     info: cib_perform_op:     +  /cib:  @num_updates=159
Dec 18 17:22:19 [2690] node2        cib:     info: cib_perform_op:     +  /cib/status/node_state[@id='node2']:  @crm-debug-origin=post_cache_update
Dec 18 17:22:19 [2690] node2        cib:     info: cib_perform_op:     +  /cib/status/node_state[@id='node1']:  @in_ccm=false, @crm-debug-origin=post_cache_update
Dec 18 17:22:19 [2690] node2        cib:     info: cib_process_request:     Completed cib_modify operation for section status: OK (rc=0, origin=node2/crmd/91, version=0.69.159)
Dec 18 17:22:19 [2690] node2        cib:     info: cib_perform_op:     Diff: --- 0.69.159 2
Dec 18 17:22:19 [2690] node2        cib:     info: cib_perform_op:     Diff: +++ 0.69.160 (null)
Dec 18 17:22:19 [2690] node2        cib:     info: cib_perform_op:     +  /cib:  @num_updates=160, @have-quorum=0
Dec 18 17:22:19 [2690] node2        cib:     info: cib_process_request:     Completed cib_modify operation for section cib: OK (rc=0, origin=node2/crmd/92, version=0.69.160)
Dec 18 17:22:19 [2690] node2        cib:     info: cib_process_request:     Forwarding cib_modify operation for section crm_config to master (origin=local/crmd/94)
Dec 18 17:22:19 [2690] node2        cib:     info: cib_process_request:     Forwarding cib_modify operation for section nodes to master (origin=local/crmd/95)
Dec 18 17:22:19 [2690] node2        cib:     info: cib_process_request:     Forwarding cib_modify operation for section status to master (origin=local/crmd/96)
Dec 18 17:22:19 [2690] node2        cib:     info: cib_process_request:     Forwarding cib_modify operation for section cib to master (origin=local/crmd/97)
Dec 18 17:22:19 [2693] node2      attrd:   notice: attrd_trigger_update:     Sending flush op to all hosts for: probe_complete (true)
Dec 18 17:22:19 [2690] node2        cib:     info: cib_process_request:     Completed cib_modify operation for section crm_config: OK (rc=0, origin=node2/crmd/94, version=0.69.160)
Dec 18 17:22:19 [2690] node2        cib:     info: cib_process_request:     Completed cib_modify operation for section nodes: OK (rc=0, origin=node2/crmd/95, version=0.69.160)
Dec 18 17:22:19 [2694] node2    pengine:   notice: unpack_config:     On loss of CCM Quorum: Ignore
Dec 18 17:22:19 [2690] node2        cib:     info: cib_perform_op:     Diff: --- 0.69.160 2
Dec 18 17:22:19 [2690] node2        cib:     info: cib_perform_op:     Diff: +++ 0.69.161 (null)
Dec 18 17:22:19 [2690] node2        cib:     info: cib_perform_op:     +  /cib:  @num_updates=161
Dec 18 17:22:19 [2690] node2        cib:     info: cib_perform_op:     +  /cib/status/node_state[@id='node2']:  @crm-debug-origin=do_state_transition
Dec 18 17:22:19 [2690] node2        cib:     info: cib_perform_op:     +  /cib/status/node_state[@id='node1']:  @crm-debug-origin=do_state_transition, @join=down
Dec 18 17:22:19 [2694] node2    pengine:     info: determine_online_status:     Node node2 is online
Dec 18 17:22:19 [2694] node2    pengine:     info: group_print:      Resource Group: g_rbd_share_1
Dec 18 17:22:19 [2690] node2        cib:     info: cib_process_request:     Completed cib_modify operation for section status: OK (rc=0, origin=node2/crmd/96, version=0.69.161)
Dec 18 17:22:19 [2694] node2    pengine:     info: native_print:          p_rbd_map_1    (ocf::ceph:rbd.in):    Stopped
Dec 18 17:22:19 [2694] node2    pengine:     info: native_print:          p_fs_rbd_1    (ocf::heartbeat:Filesystem):    Stopped
Dec 18 17:22:19 [2694] node2    pengine:     info: native_print:          p_export_rbd_1    (ocf::heartbeat:exportfs):    Stopped
Dec 18 17:22:19 [2694] node2    pengine:     info: native_print:          p_vip_1    (ocf::heartbeat:IPaddr):    Stopped
Dec 18 17:22:19 [2694] node2    pengine:     info: clone_print:      Clone Set: clo_nfs [g_nfs]
Dec 18 17:22:19 [2694] node2    pengine:     info: short_print:          Started: [ node2 ]
Dec 18 17:22:19 [2694] node2    pengine:     info: short_print:          Stopped: [ node1 ]
Dec 18 17:22:19 [2694] node2    pengine:     info: rsc_merge_weights:     p_rpcbind:1: Rolling back scores from p_nfs_server:1
Dec 18 17:22:19 [2694] node2    pengine:     info: native_color:     Resource p_rpcbind:1 cannot run anywhere
Dec 18 17:22:19 [2694] node2    pengine:     info: native_color:     Resource p_nfs_server:1 cannot run anywhere
Dec 18 17:22:19 [2694] node2    pengine:     info: RecurringOp:      Start recurring monitor (10s) for p_rbd_map_1 on node2
Dec 18 17:22:19 [2694] node2    pengine:     info: RecurringOp:      Start recurring monitor (20s) for p_fs_rbd_1 on node2
Dec 18 17:22:19 [2694] node2    pengine:     info: RecurringOp:      Start recurring monitor (10s) for p_export_rbd_1 on node2
Dec 18 17:22:19 [2694] node2    pengine:     info: RecurringOp:      Start recurring monitor (5s) for p_vip_1 on node2
Dec 18 17:22:19 [2694] node2    pengine:   notice: LogActions:     Start   p_rbd_map_1    (node2)
Dec 18 17:22:19 [2694] node2    pengine:   notice: LogActions:     Start   p_fs_rbd_1    (node2)
Dec 18 17:22:19 [2694] node2    pengine:   notice: LogActions:     Start   p_export_rbd_1    (node2)
Dec 18 17:22:19 [2694] node2    pengine:   notice: LogActions:     Start   p_vip_1    (node2)
Dec 18 17:22:19 [2694] node2    pengine:     info: LogActions:     Leave   p_rpcbind:0    (Started node2)
Dec 18 17:22:19 [2694] node2    pengine:     info: LogActions:     Leave   p_nfs_server:0    (Started node2)
Dec 18 17:22:19 [2694] node2    pengine:     info: LogActions:     Leave   p_rpcbind:1    (Stopped)
Dec 18 17:22:19 [2694] node2    pengine:     info: LogActions:     Leave   p_nfs_server:1    (Stopped)
Dec 18 17:22:19 [2690] node2        cib:     info: cib_process_request:     Completed cib_modify operation for section cib: OK (rc=0, origin=node2/crmd/97, version=0.69.161)
Dec 18 17:22:19 [2690] node2        cib:     info: cib_process_request:     Forwarding cib_modify operation for section status to master (origin=local/attrd/24)
Dec 18 17:22:19 [2690] node2        cib:     info: cib_process_request:     Completed cib_modify operation for section status: OK (rc=0, origin=node2/attrd/24, version=0.69.161)
Dec 18 17:22:19 [2690] node2        cib:     info: cib_process_request:     Forwarding cib_modify operation for section status to master (origin=local/attrd/26)
Dec 18 17:22:19 [2690] node2        cib:     info: cib_process_request:     Completed cib_modify operation for section status: OK (rc=0, origin=node2/attrd/26, version=0.69.161)
Dec 18 17:22:19 [2694] node2    pengine:   notice: process_pe_message:     Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-234.bz2
Dec 18 17:22:19 [2695] node2       crmd:     info: do_state_transition:     State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Dec 18 17:22:19 [2695] node2       crmd:     info: do_te_invoke:     Processing graph 3 (ref=pe_calc-dc-1450430539-52) derived from /var/lib/pacemaker/pengine/pe-input-234.bz2
Dec 18 17:22:19 [2695] node2       crmd:   notice: te_rsc_command:     Initiating action 6: start p_rbd_map_1_start_0 on node2 (local)
Dec 18 17:22:19 [2695] node2       crmd:     info: do_lrm_rsc_op:     Performing key=6:3:0:1b17b95d-a029-4ea5-be6d-4e5d8add6ca9 op=p_rbd_map_1_start_0
Dec 18 17:22:19 [2692] node2       lrmd:     info: log_execute:     executing - rsc:p_rbd_map_1 action:start call_id:48
Dec 18 17:22:24 [2690] node2        cib:     info: cib_process_ping:     Reporting our current digest to node2: 29875b5ae8dcbbb1d07f904ca3423de2 for 0.69.161 (0x12607b0 0)
Dec 18 17:22:39 [2692] node2       lrmd:  warning: child_timeout_callback:     p_rbd_map_1_start_0 process (PID 11010) timed out
Dec 18 17:22:39 [2692] node2       lrmd:  warning: operation_finished:     p_rbd_map_1_start_0:11010 - timed out after 20000ms
Dec 18 17:22:39 [2692] node2       lrmd:   notice: operation_finished:     p_rbd_map_1_start_0:11010:stderr [ libust[11019/11019]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305) ]
Dec 18 17:22:39 [2692] node2       lrmd:     info: log_finished:     finished - rsc:p_rbd_map_1 action:start call_id:48 pid:11010 exit-code:1 exec-time:20002ms queue-time:0ms
Dec 18 17:22:39 [2695] node2       crmd:     info: services_os_action_execute:     Managed rbd.in_meta-data_0 process 11117 exited with rc=0
Dec 18 17:22:39 [2695] node2       crmd:    error: process_lrm_event:     Operation p_rbd_map_1_start_0: Timed Out (node=node2, call=48, timeout=20000ms)

Dec 18 17:22:39 [2692] node2       lrmd:  warning: child_timeout_callback:     p_rbd_map_1_start_0 process (PID 11010) timed out
Dec 18 17:22:39 [2692] node2       lrmd:  warning: operation_finished:     p_rbd_map_1_start_0:11010 - timed out after 20000ms
Dec 18 17:22:39 [2692] node2       lrmd:   notice: operation_finished:     p_rbd_map_1_start_0:11010:stderr [ libust[11019/11019]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305) ]
Dec 18 17:22:39 [2692] node2       lrmd:     info: log_finished:     finished - rsc:p_rbd_map_1 action:start call_id:48 pid:11010 exit-code:1 exec-time:20002ms queue-time:0ms
Dec 18 17:22:39 [2695] node2       crmd:     info: services_os_action_execute:     Managed rbd.in_meta-data_0 process 11117 exited with rc=0
Dec 18 17:22:39 [2695] node2       crmd:    error: process_lrm_event:     Operation p_rbd_map_1_start_0: Timed Out (node=node2, call=48, timeout=20000ms)
Dec 18 17:22:39 [2695] node2       crmd:   notice: process_lrm_event:     node2-p_rbd_map_1_start_0:48 [ libust[11019/11019]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)\n ]
Dec 18 17:22:39 [2690] node2        cib:     info: cib_process_request:     Forwarding cib_modify operation for section status to master (origin=local/crmd/99)
Dec 18 17:22:39 [2690] node2        cib:     info: cib_perform_op:     Diff: --- 0.69.161 2
Dec 18 17:22:39 [2690] node2        cib:     info: cib_perform_op:     Diff: +++ 0.69.162 (null)
Dec 18 17:22:39 [2690] node2        cib:     info: cib_perform_op:     +  /cib:  @num_updates=162
Dec 18 17:22:39 [2690] node2        cib:     info: cib_perform_op:     +  /cib/status/node_state[@id='node2']:  @crm-debug-origin=do_update_resource
Dec 18 17:22:39 [2690] node2        cib:     info: cib_perform_op:     +  /cib/status/node_state[@id='node2']/lrm[@id='node2']/lrm_resources/lrm_resource[@id='p_rbd_map_1']/lrm_rsc_op[@id='p_rbd_map_1_last_0']:  @operation_key=p_rbd_map_1_start_0, @operation=start, @transition-key=6:3:0:1b17b95d-a029-4ea5-be6d-4e5d8add6ca9, @transition-magic=2:1;6:3:0:1b17b95d-a029-4ea5-be6d-4e5d8add6ca9, @call-id=48, @rc-code=1, @op-status=2, @last-run=1450430539, @last-rc-change=1450430539, @exec-time=20002
Dec 18 17:22:39 [2690] node2        cib:     info: cib_perform_op:     ++ /cib/status/node_state[@id='node2']/lrm[@id='node2']/lrm_resources/lrm_resource[@id='p_rbd_map_1']:  <lrm_rsc_op id="p_rbd_map_1_last_failure_0" operation_key="p_rbd_map_1_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" transition-key="6:3:0:1b17b95d-a029-4ea5-be6d-4e5d8add6ca9" transition-magic="2:1;6:3:0:1b17b95d-a029-4ea5-be6d-4e5d8add6ca9" call-id="48" rc-code="1" op-status="2" interval="0" l
Dec 18 17:22:39 [2690] node2        cib:     info: cib_process_request:     Completed cib_modify operation for section status: OK (rc=0, origin=node2/crmd/99, version=0.69.162)
Dec 18 17:22:39 [2695] node2       crmd:  warning: status_from_rc:     Action 6 (p_rbd_map_1_start_0) on node2 failed (target: 0 vs. rc: 1): Error
Dec 18 17:22:39 [2695] node2       crmd:  warning: update_failcount:     Updating failcount for p_rbd_map_1 on node2 after failed start: rc=1 (update=INFINITY, time=1450430559)
Dec 18 17:22:39 [2695] node2       crmd:   notice: abort_transition_graph:     Transition aborted by p_rbd_map_1_start_0 'modify' on node2: Event failed (magic=2:1;6:3:0:1b17b95d-a029-4ea5-be6d-4e5d8add6ca9, cib=0.69.162, source=match_graph_event:344, 0)
Dec 18 17:22:39 [2695] node2       crmd:     info: match_graph_event:     Action p_rbd_map_1_start_0 (6) confirmed on node2 (rc=4)
Dec 18 17:22:39 [2693] node2      attrd:   notice: attrd_trigger_update:     Sending flush op to all hosts for: fail-count-p_rbd_map_1 (INFINITY)
Dec 18 17:22:39 [2695] node2       crmd:  warning: update_failcount:     Updating failcount for p_rbd_map_1 on node2 after failed start: rc=1 (update=INFINITY, time=1450430559)
Dec 18 17:22:39 [2695] node2       crmd:     info: process_graph_event:     Detected action (3.6) p_rbd_map_1_start_0.48=unknown error: failed
Dec 18 17:22:39 [2695] node2       crmd:  warning: status_from_rc:     Action 6 (p_rbd_map_1_start_0) on node2 failed (target: 0 vs. rc: 1): Error
Dec 18 17:22:39 [2695] node2       crmd:  warning: update_failcount:     Updating failcount for p_rbd_map_1 on node2 after failed start: rc=1 (update=INFINITY, time=1450430559)
Dec 18 17:22:39 [2695] node2       crmd:     info: abort_transition_graph:     Transition aborted by p_rbd_map_1_start_0 'create' on (null): Event failed (magic=2:1;6:3:0:1b17b95d-a029-4ea5-be6d-4e5d8add6ca9, cib=0.69.162, source=match_graph_event:344, 0)
Dec 18 17:22:39 [2695] node2       crmd:     info: match_graph_event:     Action p_rbd_map_1_start_0 (6) confirmed on node2 (rc=4)
Dec 18 17:22:39 [2695] node2       crmd:  warning: update_failcount:     Updating failcount for p_rbd_map_1 on node2 after failed start: rc=1 (update=INFINITY, time=1450430559)
Dec 18 17:22:39 [2695] node2       crmd:     info: process_graph_event:     Detected action (3.6) p_rbd_map_1_start_0.48=unknown error: failed
Dec 18 17:22:39 [2693] node2      attrd:   notice: attrd_perform_update:     Sent update 28: fail-count-p_rbd_map_1=INFINITY
Dec 18 17:22:39 [2690] node2        cib:     info: cib_process_request:     Forwarding cib_modify operation for section status to master (origin=local/attrd/28)
Dec 18 17:22:39 [2695] node2       crmd:   notice: run_graph:     Transition 3 (Complete=2, Pending=0, Fired=0, Skipped=8, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-234.bz2): Stopped
Dec 18 17:22:39 [2695] node2       crmd:     info: do_state_transition:     State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ]
Dec 18 17:22:39 [2693] node2      attrd:   notice: attrd_trigger_update:     Sending flush op to all hosts for: last-failure-p_rbd_map_1 (1450430559)
Dec 18 17:22:39 [2690] node2        cib:     info: cib_perform_op:     Diff: --- 0.69.162 2
Dec 18 17:22:39 [2690] node2        cib:     info: cib_perform_op:     Diff: +++ 0.69.163 (null)
Dec 18 17:22:39 [2690] node2        cib:     info: cib_perform_op:     +  /cib:  @num_updates=163
Dec 18 17:22:39 [2690] node2        cib:     info: cib_perform_op:     ++ /cib/status/node_state[@id='node2']/transient_attributes[@id='node2']/instance_attributes[@id='status-node2']:  <nvpair id="status-node2-fail-count-p_rbd_map_1" name="fail-count-p_rbd_map_1" value="INFINITY"/>
.........

thanks




 



 



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Steve Anthony
LTS HPC Support Specialist
Lehigh University
sma310@xxxxxxxxxx

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux