Re: ceph does not work

"Дениска-редиска" <slim@xxxxxxxx> · Fri, 24 Feb 2012 13:33:43 +0200

running cluster of 3 nodes:

lv-test-2 ~ # ceph -s            
2012-02-24 13:10:35.481248    pg v726: 594 pgs: 594 active+clean; 120 MB data, 683 MB used, 35448 MB / 37967 MB avail
2012-02-24 13:10:35.484463   mds e177: 3/3/3 up {0=shark1=up:active,1=lv-test-1=up:active,2=lv-test-2=up:active}
2012-02-24 13:10:35.484529   osd e64: 3 osds: 3 up, 3 in
2012-02-24 13:10:35.484630   log 2012-02-24 13:09:50.009333 osd.1 10.0.1.246:6801/3929 29 : [INF] 2.5d scrub ok
2012-02-24 13:10:35.484907   mon e1: 3 mons at {lv-test-1=10.0.1.246:6789/0,lv-test-2=10.0.1.247:6789/0,shark1=10.0.1.81:6789/0}

mounting by fuse:
lv-test1 ~ # mount
ceph-fuse on /uploads type fuse.ceph-fuse (rw,nosuid,nodev,allow_other,default_permissions)

simulating write:
lv-test-1 ~ # cp -r /usr/src/linux-3.2.2-hardened-r1/ /uploads/

killing one node:
lv-test-2 ~ # killall ceph-mon ceph-mds  ceph-osd
Feb 24 13:11:17 lv-test-2 mon.lv-test-2[3474]: *** Caught signal (Terminated) **
Feb 24 13:11:17 lv-test-2 in thread 3195ce76760. Shutting down.
Feb 24 13:11:17 lv-test-2 mds.lv-test-2[3553]: *** Caught signal (Terminated) **
Feb 24 13:11:17 lv-test-2 in thread 2ee100bb760. Shutting down.
Feb 24 13:11:17 lv-test-2 osd.2[3654]: *** Caught signal (Terminated) **
Feb 24 13:11:17 lv-test-2 in thread 28f75487760. Shutting down.
Feb 24 13:11:35 lv-test-2 client.admin[3885]: 3a6a385b700 monclient: hunting for new mon
Feb 24 13:11:35 lv-test-2 client.admin[3885]: 3a6a385b700 client.5017 ms_handle_reset on 10.0.1.246:6789/0

Feb 24 13:11:17 lv-test-1 mon.lv-test-1[3751]: 2d62330a700 -- 10.0.1.246:6789/0 >> 10.0.1.247:6789/0 pipe(0x522b9ba080 sd=9 pgs=37 cs=1 l=0
).fault with nothing to send, going to standby
Feb 24 13:11:17 lv-test-1 mds.lv-test-1[3830]: 2e3b9bd0700 -- 10.0.1.246:6800/3829 >> 10.0.1.247:6800/3552 pipe(0x5faea44c0 sd=13 pgs=13 cs
=1 l=0).fault with nothing to send, going to standby
Feb 24 13:11:17 lv-test-1 client.admin[3151]: 2a9fbe55700 -- 10.0.1.246:0/3151 >> 10.0.1.247:6800/3552 pipe(0x2a9ec00f560 sd=0 pgs=9 cs=3 l
=0).fault with nothing to send, going to standby
Feb 24 13:11:17 lv-test-1 osd.1[3930]: 2a51b9f1700 osd.1 64 OSD::ms_handle_reset()
Feb 24 13:11:17 lv-test-1 osd.1[3930]: 2a51b9f1700 osd.1 64 OSD::ms_handle_reset() s=0x22b4580700
Feb 24 13:11:17 lv-test-1 client.admin[3151]: 2a9fd95b700 client.4617 ms_handle_reset on 10.0.1.247:6801/3653
Feb 24 13:11:17 lv-test-1 osd.1[3930]: 2a510df7700 -- 10.0.1.246:6803/3929 >> 10.0.1.247:0/3654 pipe(0x2a50c005000 sd=24 pgs=4 cs=1 l=0).fa
ult with nothing to send, going to standby
Feb 24 13:11:18 lv-test-1 osd.1[3930]: 2a5184e6700 -- 10.0.1.246:6802/3929 >> 10.0.1.247:6802/3653 pipe(0x2a5145e9c50 sd=19 pgs=3 cs=1 l=0)
.fault with nothing to send, going to standby
Feb 24 13:11:18 lv-test-1 mds.lv-test-1[3830]: 2e3bcadc700 mds.1.5 ms_handle_reset on 10.0.1.247:6801/3653
Feb 24 13:11:18 lv-test-1 osd.1[3930]: 2a5183e5700 -- 10.0.1.246:0/3930 >> 10.0.1.247:6803/3653 pipe(0x2a5145eaeb0 sd=20 pgs=16 cs=1 l=0).f
ault with nothing to send, going to standby
Feb 24 13:11:34 lv-test-1 osd.1[3930]: 2a5229ff700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:14.366355)
Feb 24 13:11:34 lv-test-1 osd.1[3930]: 2a5117fa700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:14.826382)
Feb 24 13:11:35 lv-test-1 osd.1[3930]: 2a5229ff700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:15.369660)
Feb 24 13:11:35 lv-test-1 osd.1[3930]: 2a51b9f1700 monclient: hunting for new mon
Feb 24 13:11:35 lv-test-1 osd.1[3930]: 2a51b9f1700 osd.1 64 OSD::ms_handle_reset()
Feb 24 13:11:36 lv-test-1 osd.1[3930]: 2a5117fa700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:16.129635)
Feb 24 13:11:36 lv-test-1 osd.1[3930]: 2a5229ff700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:16.372900)

copy hangs (cannot be killed by kill -9), /uploads is not accessible

lv-test-1 ~ # time ceph -s

^C*** Caught signal (Interrupt) **
 in thread 2da010af760. Shutting down.

real    3m16.481s
user    0m0.037s
sys     0m0.013s

lv-test-2 ~ # time ceph -s 

^C*** Caught signal (Interrupt) **
 in thread 314b193c760. Shutting down.

real    0m35.401s
user    0m0.017s
sys     0m0.007s

so cluster hanged and not responding anymore

lets bring up back killed node:

lv-test-2 ~ # /etc/init.d/ceph restart
lv-test-2 ~ # ceph -s
2012-02-24 13:20:01.996366    pg v734: 594 pgs: 594 active+clean; 120 MB data, 683 MB used, 35448 MB / 37967 MB avail
2012-02-24 13:20:01.999207   mds e177: 3/3/3 up {0=shark1=up:active,1=lv-test-1=up:active,2=lv-test-2=up:active}
2012-02-24 13:20:01.999268   osd e64: 3 osds: 3 up, 3 in
2012-02-24 13:20:01.999368   log 2012-02-24 13:11:02.267947 osd.1 10.0.1.246:6801/3929 41 : [INF] 2.89 scrub ok
2012-02-24 13:20:01.999612   mon e1: 3 mons at {lv-test-1=10.0.1.246:6789/0,lv-test-2=10.0.1.247:6789/0,shark1=10.0.1.81:6789/0}

lv-test-2 ~ # ceph -s
2012-02-24 13:20:44.984214    pg v742: 594 pgs: 594 active+clean; 144 MB data, 714 MB used, 35417 MB / 37967 MB avail
2012-02-24 13:20:44.986505   mds e182: 3/3/3 up {0=lv-test-2=up:resolve,1=lv-test-1=up:active,2=lv-test-2=up:active(laggy or crashed)}
2012-02-24 13:20:44.986697   osd e68: 3 osds: 1 up, 3 in
2012-02-24 13:20:44.986918   log 2012-02-24 13:20:42.606730 mon.1 10.0.1.246:6789/0 27 : [INF] mds.1 10.0.1.246:6800/3829 up:active
2012-02-24 13:20:44.987118   mon e1: 3 mons at {lv-test-1=10.0.1.246:6789/0,lv-test-2=10.0.1.247:6789/0,shark1=10.0.1.81:6789/0}

Feb 24 13:23:28 lv-test-2 mds.lv-test-2[4608]: 2a93085f700 -- 10.0.1.247:6800/4607 >> 10.0.1.247:6800/3552 pipe(0x19d6f37b40 sd=12 pgs=0 cs=0 l=0).connect claims to be 10.0.1.247:6800/4607 not 10.0.1.247:6800/3552 - wrong node!

Feb 24 13:24:13 lv-test-1 client.admin[3151]: 2a9fc158700 -- 10.0.1.246:0/3151 >> 10.0.1.247:6800/3552 pipe(0x2a9ec00f560 sd=0 pgs=9 cs=4 l=0).connect claims to be 10.0.1.247:6800/4607 not 10.0.1.247:6800/3552 - wrong node!
Feb 24 13:24:14 lv-test-1 mds.lv-test-1[3830]: 2e3b981b700 -- 10.0.1.246:6800/3829 >> 10.0.1.247:6800/3552 pipe(0x5faea44c0 sd=8 pgs=13 cs=2 l=0).connect claims to be 10.0.1.247:6800/4607 not 10.0.1.247:6800/3552 - wrong node!

lv-test-1 ~ # ceph -s
2012-02-24 13:24:36.558927    pg v762: 594 pgs: 594 active+clean; 144 MB data, 741 MB used, 35390 MB / 37967 MB avail
2012-02-24 13:24:36.560927   mds e195: 3/3/3 up {0=lv-test-2=up:resolve,1=lv-test-1=up:active,2=lv-test-2=up:active(laggy or crashed)}
2012-02-24 13:24:36.560988   osd e70: 3 osds: 2 up, 3 in
2012-02-24 13:24:36.561092   log 2012-02-24 13:24:29.691540 osd.2 10.0.1.247:6801/4706 17 : [INF] 0.77 scrub ok
2012-02-24 13:24:36.561201   mon e1: 3 mons at {lv-test-1=10.0.1.246:6789/0,lv-test-2=10.0.1.247:6789/0,shark1=10.0.1.81:6789/0}

mount point still inaccessible. Thats all, that sucks

is there proven scenario to build cluster of 3 nodes with replication that tolerates shutdown of two nodes without lockups of read/write process ?

Цитирование "Tommi Virtanen" <tommi.virtanen@xxxxxxxxxxxxx>:
> On Thu, Feb 23, 2012 at 11:07, Gregory Farnum
> <gregory.farnum@xxxxxxxxxxxxx> wrote:
>>>> 3 nodes, each running mon, mds & osd with replication level 3 for data & met
>>>>adata pools.
> ...
>> Actually the OSDs will happily (well, not happily; the will complain.
>> But they will run) run in degraded mode. However, if you have 3 active
>> MDSes and you kill one of them without a standby available, you will
>> lose access to part of your tree. That's probably what happened
>> here...
> 
> So let's try that angle. Slim, can you share the output of "ceph -s" with us
>?

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html