On Feb 24, 2012, at 3:33 AM, "Дениска-редиска" <slim@xxxxxxxx> wrote: > running cluster of 3 nodes: > > lv-test-2 ~ # ceph -s > 2012-02-24 13:10:35.481248 pg v726: 594 pgs: 594 active+clean; 120 MB data, 683 MB used, 35448 MB / 37967 MB avail > 2012-02-24 13:10:35.484463 mds e177: 3/3/3 up {0=shark1=up:active,1=lv-test-1=up:active,2=lv-test-2=up:active} > 2012-02-24 13:10:35.484529 osd e64: 3 osds: 3 up, 3 in > 2012-02-24 13:10:35.484630 log 2012-02-24 13:09:50.009333 osd.1 10.0.1.246:6801/3929 29 : [INF] 2.5d scrub ok > 2012-02-24 13:10:35.484907 mon e1: 3 mons at {lv-test-1=10.0.1.246:6789/0,lv-test-2=10.0.1.247:6789/0,shark1=10.0.1.81:6789/0} > > mounting by fuse: > lv-test1 ~ # mount > ceph-fuse on /uploads type fuse.ceph-fuse (rw,nosuid,nodev,allow_other,default_permissions) > > simulating write: > lv-test-1 ~ # cp -r /usr/src/linux-3.2.2-hardened-r1/ /uploads/ > > killing one node: > lv-test-2 ~ # killall ceph-mon ceph-mds ceph-osd > Feb 24 13:11:17 lv-test-2 mon.lv-test-2[3474]: *** Caught signal (Terminated) ** > Feb 24 13:11:17 lv-test-2 in thread 3195ce76760. Shutting down. > Feb 24 13:11:17 lv-test-2 mds.lv-test-2[3553]: *** Caught signal (Terminated) ** > Feb 24 13:11:17 lv-test-2 in thread 2ee100bb760. Shutting down. > Feb 24 13:11:17 lv-test-2 osd.2[3654]: *** Caught signal (Terminated) ** > Feb 24 13:11:17 lv-test-2 in thread 28f75487760. Shutting down. > Feb 24 13:11:35 lv-test-2 client.admin[3885]: 3a6a385b700 monclient: hunting for new mon > Feb 24 13:11:35 lv-test-2 client.admin[3885]: 3a6a385b700 client.5017 ms_handle_reset on 10.0.1.246:6789/0 > > Feb 24 13:11:17 lv-test-1 mon.lv-test-1[3751]: 2d62330a700 -- 10.0.1.246:6789/0 >> 10.0.1.247:6789/0 pipe(0x522b9ba080 sd=9 pgs=37 cs=1 l=0 > ).fault with nothing to send, going to standby > Feb 24 13:11:17 lv-test-1 mds.lv-test-1[3830]: 2e3b9bd0700 -- 10.0.1.246:6800/3829 >> 10.0.1.247:6800/3552 pipe(0x5faea44c0 sd=13 pgs=13 cs > =1 l=0).fault with nothing to send, going to standby > Feb 24 13:11:17 lv-test-1 client.admin[3151]: 2a9fbe55700 -- 10.0.1.246:0/3151 >> 10.0.1.247:6800/3552 pipe(0x2a9ec00f560 sd=0 pgs=9 cs=3 l > =0).fault with nothing to send, going to standby > Feb 24 13:11:17 lv-test-1 osd.1[3930]: 2a51b9f1700 osd.1 64 OSD::ms_handle_reset() > Feb 24 13:11:17 lv-test-1 osd.1[3930]: 2a51b9f1700 osd.1 64 OSD::ms_handle_reset() s=0x22b4580700 > Feb 24 13:11:17 lv-test-1 client.admin[3151]: 2a9fd95b700 client.4617 ms_handle_reset on 10.0.1.247:6801/3653 > Feb 24 13:11:17 lv-test-1 osd.1[3930]: 2a510df7700 -- 10.0.1.246:6803/3929 >> 10.0.1.247:0/3654 pipe(0x2a50c005000 sd=24 pgs=4 cs=1 l=0).fa > ult with nothing to send, going to standby > Feb 24 13:11:18 lv-test-1 osd.1[3930]: 2a5184e6700 -- 10.0.1.246:6802/3929 >> 10.0.1.247:6802/3653 pipe(0x2a5145e9c50 sd=19 pgs=3 cs=1 l=0) > .fault with nothing to send, going to standby > Feb 24 13:11:18 lv-test-1 mds.lv-test-1[3830]: 2e3bcadc700 mds.1.5 ms_handle_reset on 10.0.1.247:6801/3653 > Feb 24 13:11:18 lv-test-1 osd.1[3930]: 2a5183e5700 -- 10.0.1.246:0/3930 >> 10.0.1.247:6803/3653 pipe(0x2a5145eaeb0 sd=20 pgs=16 cs=1 l=0).f > ault with nothing to send, going to standby > Feb 24 13:11:34 lv-test-1 osd.1[3930]: 2a5229ff700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:14.366355) > Feb 24 13:11:34 lv-test-1 osd.1[3930]: 2a5117fa700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:14.826382) > Feb 24 13:11:35 lv-test-1 osd.1[3930]: 2a5229ff700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:15.369660) > Feb 24 13:11:35 lv-test-1 osd.1[3930]: 2a51b9f1700 monclient: hunting for new mon > Feb 24 13:11:35 lv-test-1 osd.1[3930]: 2a51b9f1700 osd.1 64 OSD::ms_handle_reset() > Feb 24 13:11:36 lv-test-1 osd.1[3930]: 2a5117fa700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:16.129635) > Feb 24 13:11:36 lv-test-1 osd.1[3930]: 2a5229ff700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:16.372900) > > copy hangs (cannot be killed by kill -9), /uploads is not accessible > > lv-test-1 ~ # time ceph -s > > ^C*** Caught signal (Interrupt) ** > in thread 2da010af760. Shutting down. > > > real 3m16.481s > user 0m0.037s > sys 0m0.013s > > lv-test-2 ~ # time ceph -s > > > ^C*** Caught signal (Interrupt) ** > in thread 314b193c760. Shutting down. > > > real 0m35.401s > user 0m0.017s > sys 0m0.007s > > so cluster hanged and not responding anymore > > lets bring up back killed node: > > lv-test-2 ~ # /etc/init.d/ceph restart > lv-test-2 ~ # ceph -s > 2012-02-24 13:20:01.996366 pg v734: 594 pgs: 594 active+clean; 120 MB data, 683 MB used, 35448 MB / 37967 MB avail > 2012-02-24 13:20:01.999207 mds e177: 3/3/3 up {0=shark1=up:active,1=lv-test-1=up:active,2=lv-test-2=up:active} > 2012-02-24 13:20:01.999268 osd e64: 3 osds: 3 up, 3 in > 2012-02-24 13:20:01.999368 log 2012-02-24 13:11:02.267947 osd.1 10.0.1.246:6801/3929 41 : [INF] 2.89 scrub ok > 2012-02-24 13:20:01.999612 mon e1: 3 mons at {lv-test-1=10.0.1.246:6789/0,lv-test-2=10.0.1.247:6789/0,shark1=10.0.1.81:6789/0} > > lv-test-2 ~ # ceph -s > 2012-02-24 13:20:44.984214 pg v742: 594 pgs: 594 active+clean; 144 MB data, 714 MB used, 35417 MB / 37967 MB avail > 2012-02-24 13:20:44.986505 mds e182: 3/3/3 up {0=lv-test-2=up:resolve,1=lv-test-1=up:active,2=lv-test-2=up:active(laggy or crashed)} > 2012-02-24 13:20:44.986697 osd e68: 3 osds: 1 up, 3 in > 2012-02-24 13:20:44.986918 log 2012-02-24 13:20:42.606730 mon.1 10.0.1.246:6789/0 27 : [INF] mds.1 10.0.1.246:6800/3829 up:active > 2012-02-24 13:20:44.987118 mon e1: 3 mons at {lv-test-1=10.0.1.246:6789/0,lv-test-2=10.0.1.247:6789/0,shark1=10.0.1.81:6789/0} > > Feb 24 13:23:28 lv-test-2 mds.lv-test-2[4608]: 2a93085f700 -- 10.0.1.247:6800/4607 >> 10.0.1.247:6800/3552 pipe(0x19d6f37b40 sd=12 pgs=0 cs=0 l=0).connect claims to be 10.0.1.247:6800/4607 not 10.0.1.247:6800/3552 - wrong node! > > Feb 24 13:24:13 lv-test-1 client.admin[3151]: 2a9fc158700 -- 10.0.1.246:0/3151 >> 10.0.1.247:6800/3552 pipe(0x2a9ec00f560 sd=0 pgs=9 cs=4 l=0).connect claims to be 10.0.1.247:6800/4607 not 10.0.1.247:6800/3552 - wrong node! > Feb 24 13:24:14 lv-test-1 mds.lv-test-1[3830]: 2e3b981b700 -- 10.0.1.246:6800/3829 >> 10.0.1.247:6800/3552 pipe(0x5faea44c0 sd=8 pgs=13 cs=2 l=0).connect claims to be 10.0.1.247:6800/4607 not 10.0.1.247:6800/3552 - wrong node! > > > lv-test-1 ~ # ceph -s > 2012-02-24 13:24:36.558927 pg v762: 594 pgs: 594 active+clean; 144 MB data, 741 MB used, 35390 MB / 37967 MB avail > 2012-02-24 13:24:36.560927 mds e195: 3/3/3 up {0=lv-test-2=up:resolve,1=lv-test-1=up:active,2=lv-test-2=up:active(laggy or crashed)} > 2012-02-24 13:24:36.560988 osd e70: 3 osds: 2 up, 3 in > 2012-02-24 13:24:36.561092 log 2012-02-24 13:24:29.691540 osd.2 10.0.1.247:6801/4706 17 : [INF] 0.77 scrub ok > 2012-02-24 13:24:36.561201 mon e1: 3 mons at {lv-test-1=10.0.1.246:6789/0,lv-test-2=10.0.1.247:6789/0,shark1=10.0.1.81:6789/0} Okay, so you can see here that one MDS is active, one is in the "resolve" state, and another one is apparently crashed. If you have logs or core dumps that you can send us we'd appreciate it, but in the meantime: the Ceph distributed filesystem is not yet production-ready, and a system with multiple active MDSes is significantly less stable and well-tested. If you try using one active MDS and leave the test in standby you will almost certainly see better results (and you're unlikely to be bottlenecked by it). :) Also, one of your OSDs is down, and if that crashed it's a much bigger concern to us right now...can you check the log and see what it says? -Greg > > > mount point still inaccessible. Thats all, that sucks > > > is there proven scenario to build cluster of 3 nodes with replication that tolerates shutdown of two nodes without lockups of read/write process ? > > > > Цитирование "Tommi Virtanen" <tommi.virtanen@xxxxxxxxxxxxx>: >> On Thu, Feb 23, 2012 at 11:07, Gregory Farnum >> <gregory.farnum@xxxxxxxxxxxxx> wrote: >>>>> 3 nodes, each running mon, mds & osd with replication level 3 for data & met >>>>> adata pools. >> ... >>> Actually the OSDs will happily (well, not happily; the will complain. >>> But they will run) run in degraded mode. However, if you have 3 active >>> MDSes and you kill one of them without a standby available, you will >>> lose access to part of your tree. That's probably what happened >>> here... >> >> So let's try that angle. Slim, can you share the output of "ceph -s" with us >> ? > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html