Re: ceph does not work

Gregory Farnum <gregory.farnum@xxxxxxxxxxxxx> · Fri, 24 Feb 2012 08:47:00 -0800

On Feb 24, 2012, at 3:33 AM, "Дениска-редиска" <slim@xxxxxxxx> wrote:

> running cluster of 3 nodes:
>
> lv-test-2 ~ # ceph -s
> 2012-02-24 13:10:35.481248    pg v726: 594 pgs: 594 active+clean; 120 MB data, 683 MB used, 35448 MB / 37967 MB avail
> 2012-02-24 13:10:35.484463   mds e177: 3/3/3 up {0=shark1=up:active,1=lv-test-1=up:active,2=lv-test-2=up:active}
> 2012-02-24 13:10:35.484529   osd e64: 3 osds: 3 up, 3 in
> 2012-02-24 13:10:35.484630   log 2012-02-24 13:09:50.009333 osd.1 10.0.1.246:6801/3929 29 : [INF] 2.5d scrub ok
> 2012-02-24 13:10:35.484907   mon e1: 3 mons at {lv-test-1=10.0.1.246:6789/0,lv-test-2=10.0.1.247:6789/0,shark1=10.0.1.81:6789/0}
>
> mounting by fuse:
> lv-test1 ~ # mount
> ceph-fuse on /uploads type fuse.ceph-fuse (rw,nosuid,nodev,allow_other,default_permissions)
>
> simulating write:
> lv-test-1 ~ # cp -r /usr/src/linux-3.2.2-hardened-r1/ /uploads/
>
> killing one node:
> lv-test-2 ~ # killall ceph-mon ceph-mds  ceph-osd
> Feb 24 13:11:17 lv-test-2 mon.lv-test-2[3474]: *** Caught signal (Terminated) **
> Feb 24 13:11:17 lv-test-2 in thread 3195ce76760. Shutting down.
> Feb 24 13:11:17 lv-test-2 mds.lv-test-2[3553]: *** Caught signal (Terminated) **
> Feb 24 13:11:17 lv-test-2 in thread 2ee100bb760. Shutting down.
> Feb 24 13:11:17 lv-test-2 osd.2[3654]: *** Caught signal (Terminated) **
> Feb 24 13:11:17 lv-test-2 in thread 28f75487760. Shutting down.
> Feb 24 13:11:35 lv-test-2 client.admin[3885]: 3a6a385b700 monclient: hunting for new mon
> Feb 24 13:11:35 lv-test-2 client.admin[3885]: 3a6a385b700 client.5017 ms_handle_reset on 10.0.1.246:6789/0
>
> Feb 24 13:11:17 lv-test-1 mon.lv-test-1[3751]: 2d62330a700 -- 10.0.1.246:6789/0 >> 10.0.1.247:6789/0 pipe(0x522b9ba080 sd=9 pgs=37 cs=1 l=0
> ).fault with nothing to send, going to standby
> Feb 24 13:11:17 lv-test-1 mds.lv-test-1[3830]: 2e3b9bd0700 -- 10.0.1.246:6800/3829 >> 10.0.1.247:6800/3552 pipe(0x5faea44c0 sd=13 pgs=13 cs
> =1 l=0).fault with nothing to send, going to standby
> Feb 24 13:11:17 lv-test-1 client.admin[3151]: 2a9fbe55700 -- 10.0.1.246:0/3151 >> 10.0.1.247:6800/3552 pipe(0x2a9ec00f560 sd=0 pgs=9 cs=3 l
> =0).fault with nothing to send, going to standby
> Feb 24 13:11:17 lv-test-1 osd.1[3930]: 2a51b9f1700 osd.1 64 OSD::ms_handle_reset()
> Feb 24 13:11:17 lv-test-1 osd.1[3930]: 2a51b9f1700 osd.1 64 OSD::ms_handle_reset() s=0x22b4580700
> Feb 24 13:11:17 lv-test-1 client.admin[3151]: 2a9fd95b700 client.4617 ms_handle_reset on 10.0.1.247:6801/3653
> Feb 24 13:11:17 lv-test-1 osd.1[3930]: 2a510df7700 -- 10.0.1.246:6803/3929 >> 10.0.1.247:0/3654 pipe(0x2a50c005000 sd=24 pgs=4 cs=1 l=0).fa
> ult with nothing to send, going to standby
> Feb 24 13:11:18 lv-test-1 osd.1[3930]: 2a5184e6700 -- 10.0.1.246:6802/3929 >> 10.0.1.247:6802/3653 pipe(0x2a5145e9c50 sd=19 pgs=3 cs=1 l=0)
> .fault with nothing to send, going to standby
> Feb 24 13:11:18 lv-test-1 mds.lv-test-1[3830]: 2e3bcadc700 mds.1.5 ms_handle_reset on 10.0.1.247:6801/3653
> Feb 24 13:11:18 lv-test-1 osd.1[3930]: 2a5183e5700 -- 10.0.1.246:0/3930 >> 10.0.1.247:6803/3653 pipe(0x2a5145eaeb0 sd=20 pgs=16 cs=1 l=0).f
> ault with nothing to send, going to standby
> Feb 24 13:11:34 lv-test-1 osd.1[3930]: 2a5229ff700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:14.366355)
> Feb 24 13:11:34 lv-test-1 osd.1[3930]: 2a5117fa700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:14.826382)
> Feb 24 13:11:35 lv-test-1 osd.1[3930]: 2a5229ff700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:15.369660)
> Feb 24 13:11:35 lv-test-1 osd.1[3930]: 2a51b9f1700 monclient: hunting for new mon
> Feb 24 13:11:35 lv-test-1 osd.1[3930]: 2a51b9f1700 osd.1 64 OSD::ms_handle_reset()
> Feb 24 13:11:36 lv-test-1 osd.1[3930]: 2a5117fa700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:16.129635)
> Feb 24 13:11:36 lv-test-1 osd.1[3930]: 2a5229ff700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:16.372900)
>
> copy hangs (cannot be killed by kill -9), /uploads is not accessible
>
> lv-test-1 ~ # time ceph -s
>
> ^C*** Caught signal (Interrupt) **
> in thread 2da010af760. Shutting down.
>
>
> real    3m16.481s
> user    0m0.037s
> sys     0m0.013s
>
> lv-test-2 ~ # time ceph -s
>
>
> ^C*** Caught signal (Interrupt) **
> in thread 314b193c760. Shutting down.
>
>
> real    0m35.401s
> user    0m0.017s
> sys     0m0.007s
>
> so cluster hanged and not responding anymore
>
> lets bring up back killed node:
>
> lv-test-2 ~ # /etc/init.d/ceph restart
> lv-test-2 ~ # ceph -s
> 2012-02-24 13:20:01.996366    pg v734: 594 pgs: 594 active+clean; 120 MB data, 683 MB used, 35448 MB / 37967 MB avail
> 2012-02-24 13:20:01.999207   mds e177: 3/3/3 up {0=shark1=up:active,1=lv-test-1=up:active,2=lv-test-2=up:active}
> 2012-02-24 13:20:01.999268   osd e64: 3 osds: 3 up, 3 in
> 2012-02-24 13:20:01.999368   log 2012-02-24 13:11:02.267947 osd.1 10.0.1.246:6801/3929 41 : [INF] 2.89 scrub ok
> 2012-02-24 13:20:01.999612   mon e1: 3 mons at {lv-test-1=10.0.1.246:6789/0,lv-test-2=10.0.1.247:6789/0,shark1=10.0.1.81:6789/0}
>
> lv-test-2 ~ # ceph -s
> 2012-02-24 13:20:44.984214    pg v742: 594 pgs: 594 active+clean; 144 MB data, 714 MB used, 35417 MB / 37967 MB avail
> 2012-02-24 13:20:44.986505   mds e182: 3/3/3 up {0=lv-test-2=up:resolve,1=lv-test-1=up:active,2=lv-test-2=up:active(laggy or crashed)}
> 2012-02-24 13:20:44.986697   osd e68: 3 osds: 1 up, 3 in
> 2012-02-24 13:20:44.986918   log 2012-02-24 13:20:42.606730 mon.1 10.0.1.246:6789/0 27 : [INF] mds.1 10.0.1.246:6800/3829 up:active
> 2012-02-24 13:20:44.987118   mon e1: 3 mons at {lv-test-1=10.0.1.246:6789/0,lv-test-2=10.0.1.247:6789/0,shark1=10.0.1.81:6789/0}
>
> Feb 24 13:23:28 lv-test-2 mds.lv-test-2[4608]: 2a93085f700 -- 10.0.1.247:6800/4607 >> 10.0.1.247:6800/3552 pipe(0x19d6f37b40 sd=12 pgs=0 cs=0 l=0).connect claims to be 10.0.1.247:6800/4607 not 10.0.1.247:6800/3552 - wrong node!
>
> Feb 24 13:24:13 lv-test-1 client.admin[3151]: 2a9fc158700 -- 10.0.1.246:0/3151 >> 10.0.1.247:6800/3552 pipe(0x2a9ec00f560 sd=0 pgs=9 cs=4 l=0).connect claims to be 10.0.1.247:6800/4607 not 10.0.1.247:6800/3552 - wrong node!
> Feb 24 13:24:14 lv-test-1 mds.lv-test-1[3830]: 2e3b981b700 -- 10.0.1.246:6800/3829 >> 10.0.1.247:6800/3552 pipe(0x5faea44c0 sd=8 pgs=13 cs=2 l=0).connect claims to be 10.0.1.247:6800/4607 not 10.0.1.247:6800/3552 - wrong node!
>
>
> lv-test-1 ~ # ceph -s
> 2012-02-24 13:24:36.558927    pg v762: 594 pgs: 594 active+clean; 144 MB data, 741 MB used, 35390 MB / 37967 MB avail
> 2012-02-24 13:24:36.560927   mds e195: 3/3/3 up {0=lv-test-2=up:resolve,1=lv-test-1=up:active,2=lv-test-2=up:active(laggy or crashed)}
> 2012-02-24 13:24:36.560988   osd e70: 3 osds: 2 up, 3 in
> 2012-02-24 13:24:36.561092   log 2012-02-24 13:24:29.691540 osd.2 10.0.1.247:6801/4706 17 : [INF] 0.77 scrub ok
> 2012-02-24 13:24:36.561201   mon e1: 3 mons at {lv-test-1=10.0.1.246:6789/0,lv-test-2=10.0.1.247:6789/0,shark1=10.0.1.81:6789/0}

Okay, so you can see here that one MDS is active, one is in the
"resolve" state, and another one is apparently crashed. If you have
logs or core dumps that you can send us we'd appreciate it, but in the
meantime: the Ceph distributed filesystem is not yet production-ready,
and a system with multiple active MDSes is significantly less stable
and well-tested. If you try using one active MDS and leave the test in
standby you will almost certainly see better results (and you're
unlikely to be bottlenecked by it). :)

Also, one of your OSDs is down, and if that crashed it's a much bigger
concern to us right now...can you check the log and see what it says?
-Greg

>
>
> mount point still inaccessible. Thats all, that sucks
>
>
> is there proven scenario to build cluster of 3 nodes with replication that tolerates shutdown of two nodes without lockups of read/write process ?
>
>
>
> Цитирование "Tommi Virtanen" <tommi.virtanen@xxxxxxxxxxxxx>:
>> On Thu, Feb 23, 2012 at 11:07, Gregory Farnum
>> <gregory.farnum@xxxxxxxxxxxxx> wrote:
>>>>> 3 nodes, each running mon, mds & osd with replication level 3 for data & met
>>>>> adata pools.
>> ...
>>> Actually the OSDs will happily (well, not happily; the will complain.
>>> But they will run) run in degraded mode. However, if you have 3 active
>>> MDSes and you kill one of them without a standby available, you will
>>> lose access to part of your tree. That's probably what happened
>>> here...
>>
>> So let's try that angle. Slim, can you share the output of "ceph -s" with us
>> ?
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html