nfs over rbd problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi list
I have a test ceph cluster include 3 nodes (node0: mon, node1: osd and nfs server1, node2 osd and  nfs server2).
os :centos6.6 ,kernel :3.10.94-1.el6.elrepo.x86_64, ceph version 0.94.5
I followed the  http://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/ instructions to setup an active/standy  NFS environment.
when using commands " #service corosync stop or # poweroff " on node1 ,  the fail over switch situation went fine ( nfs server take over by node2). But when I testing the situation of cutting off the power of node1, the switch is failed.
1. [root@node1 ~]# crm status
Last updated: Fri Dec 18 17:14:19 2015
Last change: Fri Dec 18 17:13:29 2015
Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 3 expected votes
8 Resources configured
Online: [ node1 node2 ]
 Resource Group: g_rbd_share_1
     p_rbd_map_1        (ocf::ceph:rbd.in):     Started node1
     p_fs_rbd_1 (ocf::heartbeat:Filesystem):    Started node1
     p_export_rbd_1     (ocf::heartbeat:exportfs):      Started node1
     p_vip_1    (ocf::heartbeat:IPaddr):        Started node1
 Clone Set: clo_nfs [g_nfs]
     Started: [ node1 node2 ]
2. [root@node1 ~]# service corosync stop
[root@node2 cluster]# crm status
Last updated: Fri Dec 18 17:14:59 2015
Last change: Fri Dec 18 17:13:29 2015
Stack: classic openais (with plugin)
Current DC: node2 - partition WITHOUT quorum
Version: 1.1.11-97629de
2 Nodes configured, 3 expected votes
8 Resources configured
Online: [ node2 ]
OFFLINE: [ node1 ]
 Resource Group: g_rbd_share_1
     p_rbd_map_1        (ocf::ceph:rbd.in):     Started node2
     p_fs_rbd_1 (ocf::heartbeat:Filesystem):    Started node2
     p_export_rbd_1     (ocf::heartbeat:exportfs):      Started node2
     p_vip_1    (ocf::heartbeat:IPaddr):        Started node2
 Clone Set: clo_nfs [g_nfs]
     Started: [ node2 ]
     Stopped: [ node1 ]

3. cut off node1 power manually
[root@node2 cluster]# crm status
Last updated: Fri Dec 18 17:23:06 2015
Last change: Fri Dec 18 17:13:29 2015
Stack: classic openais (with plugin)
Current DC: node2 - partition WITHOUT quorum
Version: 1.1.11-97629de
2 Nodes configured, 3 expected votes
8 Resources configured
Online: [ node2 ]
OFFLINE: [ node1 ]
 Clone Set: clo_nfs [g_nfs]
     Started: [ node2 ]
     Stopped: [ node1 ]
Failed actions:
    p_rbd_map_1_start_0 on node2 'unknown error' (1): call=48, status=Timed Out, last-rc-change='Fri Dec 18 17:22:19 2015', queued=0ms, exec=20002ms
corosync.log:
Dec 18 17:22:39 [2692] node2       lrmd:  warning: child_timeout_callback:     p_rbd_map_1_start_0 process (PID 11010) timed out
Dec 18 17:22:39 [2692] node2       lrmd:  warning: operation_finished:     p_rbd_map_1_start_0:11010 - timed out after 20000ms
Dec 18 17:22:39 [2692] node2       lrmd:   notice: operation_finished:     p_rbd_map_1_start_0:11010:stderr [ libust[11019/11019]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305) ]
Dec 18 17:22:39 [2692] node2       lrmd:     info: log_finished:     finished - rsc:p_rbd_map_1 action:start call_id:48 pid:11010 exit-code:1 exec-time:20002ms queue-time:0ms
Dec 18 17:22:39 [2695] node2       crmd:     info: services_os_action_execute:     Managed rbd.in_meta-data_0 process 11117 exited with rc=0
Dec 18 17:22:39 [2695] node2       crmd:    error: process_lrm_event:     Operation p_rbd_map_1_start_0: Timed Out (node=node2, call=48, timeout=20000ms)
Dec 18 17:22:39 [2695] node2       crmd:   notice: process_lrm_event:     node2-p_rbd_map_1_start_0:48 [ libust[11019/11019]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)\n ]
Dec 18 17:22:39 [2690] node2        cib:     info: cib_process_request:     Forwarding cib_modify operation for section status to master (origin=local/crmd/99)
Dec 18 17:22:39 [2690] node2        cib:     info: cib_perform_op:     Diff: --- 0.69.161 2
Dec 18 17:22:39 [2690] node2        cib:     info: cib_perform_op:     Diff: +++ 0.69.162 (null)
Dec 18 17:22:39 [2690] node2        cib:     info: cib_perform_op:     +  /cib:  @num_updates=162
Dec 18 17:22:39 [2690] node2        cib:     info: cib_perform_op:     +  /cib/status/node_state[@id='node2']:  @crm-debug-origin=do_update_resource
Dec 18 17:22:39 [2690] node2        cib:     info: cib_perform_op:     +  /cib/status/node_state[@id='node2']/lrm[@id='node2']/lrm_resources/lrm_resource[@id='p_rbd_map_1']/lrm_rsc_op[@id='p_rbd_map_1_last_0']:  @operation_key=p_rbd_map_1_start_0, @operation=start, @transition-key=6:3:0:1b17b95d-a029-4ea5-be6d-4e5d8add6ca9, @transition-magic=2:1;6:3:0:1b17b95d-a029-4ea5-be6d-4e5d8add6ca9, @call-id=48, @rc-code=1, @op-status=2, @last-run=1450430539, @last-rc-change=1450430539, @exec-time=20002
Dec 18 17:22:39 [2690] node2        cib:     info: cib_perform_op:     ++ /cib/status/node_state[@id='node2']/lrm[@id='node2']/lrm_resources/lrm_resource[@id='p_rbd_map_1']:  <lrm_rsc_op id="p_rbd_map_1_last_failure_0" operation_key="p_rbd_map_1_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" transition-key="6:3:0:1b17b95d-a029-4ea5-be6d-4e5d8add6ca9" transition-magic="2:1;6:3:0:1b17b95d-a029-4ea5-be6d-4e5d8add6ca9" call-id="48" rc-code="1" op-status="2" interval="0" l
Dec 18 17:22:39 [2690] node2        cib:     info: cib_process_request:     Completed cib_modify operation for section status: OK (rc=0, origin=node2/crmd/99, version=0.69.162)
Dec 18 17:22:39 [2695] node2       crmd:  warning: status_from_rc:     Action 6 (p_rbd_map_1_start_0) on node2 failed (target: 0 vs. rc: 1): Error
Dec 18 17:22:39 [2695] node2       crmd:  warning: update_failcount:     Updating failcount for p_rbd_map_1 on node2 after failed start: rc=1 (update=INFINITY, time=1450430559)
Dec 18 17:22:39 [2695] node2       crmd:   notice: abort_transition_graph:     Transition aborted by p_rbd_map_1_start_0 'modify' on node2: Event failed (magic=2:1;6:3:0:1b17b95d-a029-4ea5-be6d-4e5d8add6ca9, cib=0.69.162, source=match_graph_event:344, 0)
Dec 18 17:22:39 [2695] node2       crmd:     info: match_graph_event:     Action p_rbd_map_1_start_0 (6) confirmed on node2 (rc=4)
Dec 18 17:22:39 [2693] node2      attrd:   notice: attrd_trigger_update:     Sending flush op to all hosts for: fail-count-p_rbd_map_1 (INFINITY)
Dec 18 17:22:39 [2695] node2       crmd:  warning: update_failcount:     Updating failcount for p_rbd_map_1 on node2 after failed start: rc=1 (update=INFINITY, time=1450430559)
Dec 18 17:22:39 [2695] node2       crmd:     info: process_graph_event:     Detected action (3.6) p_rbd_map_1_start_0.48=unknown error: failed
Dec 18 17:22:39 [2695] node2       crmd:  warning: status_from_rc:     Action 6 (p_rbd_map_1_start_0) on node2 failed (target: 0 vs. rc: 1): Error
Dec 18 17:22:39 [2695] node2       crmd:  warning: update_failcount:     Updating failcount for p_rbd_map_1 on node2 after failed start: rc=1 (update=INFINITY, time=1450430559)
Dec 18 17:22:39 [2695] node2       crmd:     info: abort_transition_graph:     Transition aborted by p_rbd_map_1_start_0 'create' on (null): Event failed (magic=2:1;6:3:0:1b17b95d-a029-4ea5-be6d-4e5d8add6ca9, cib=0.69.162, source=match_graph_event:344, 0)
Dec 18 17:22:39 [2695] node2       crmd:     info: match_graph_event:     Action p_rbd_map_1_start_0 (6) confirmed on node2 (rc=4)
Dec 18 17:22:39 [2695] node2       crmd:  warning: update_failcount:     Updating failcount for p_rbd_map_1 on node2 after failed start: rc=1 (update=INFINITY, time=1450430559)
Dec 18 17:22:39 [2695] node2       crmd:     info: process_graph_event:     Detected action (3.6) p_rbd_map_1_start_0.48=unknown error: failed
Dec 18 17:22:39 [2693] node2      attrd:   notice: attrd_perform_update:     Sent update 28: fail-count-p_rbd_map_1=INFINITY
Dec 18 17:22:39 [2690] node2        cib:     info: cib_process_request:     Forwarding cib_modify operation for section status to master (origin=local/attrd/28)
Dec 18 17:22:39 [2695] node2       crmd:   notice: run_graph:     Transition 3 (Complete=2, Pending=0, Fired=0, Skipped=8, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-234.bz2): Stopped
Dec 18 17:22:39 [2695] node2       crmd:     info: do_state_transition:     State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ]
Dec 18 17:22:39 [2693] node2      attrd:   notice: attrd_trigger_update:     Sending flush op to all hosts for: last-failure-p_rbd_map_1 (1450430559)
Dec 18 17:22:39 [2690] node2        cib:     info: cib_perform_op:     Diff: --- 0.69.162 2
Dec 18 17:22:39 [2690] node2        cib:     info: cib_perform_op:     Diff: +++ 0.69.163 (null)
Dec 18 17:22:39 [2690] node2        cib:     info: cib_perform_op:     +  /cib:  @num_updates=163
Dec 18 17:22:39 [2690] node2        cib:     info: cib_perform_op:     ++ /cib/status/node_state[@id='node2']/transient_attributes[@id='node2']/instance_attributes[@id='status-node2']:  <nvpair id="status-node2-fail-count-p_rbd_map_1" name="fail-count-p_rbd_map_1" value="INFINITY"/>
.........

thanks




 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux