running cluster of 3 nodes: lv-test-2 ~ # ceph -s 2012-02-24 13:10:35.481248 pg v726: 594 pgs: 594 active+clean; 120 MB data, 683 MB used, 35448 MB / 37967 MB avail 2012-02-24 13:10:35.484463 mds e177: 3/3/3 up {0=shark1=up:active,1=lv-test-1=up:active,2=lv-test-2=up:active} 2012-02-24 13:10:35.484529 osd e64: 3 osds: 3 up, 3 in 2012-02-24 13:10:35.484630 log 2012-02-24 13:09:50.009333 osd.1 10.0.1.246:6801/3929 29 : [INF] 2.5d scrub ok 2012-02-24 13:10:35.484907 mon e1: 3 mons at {lv-test-1=10.0.1.246:6789/0,lv-test-2=10.0.1.247:6789/0,shark1=10.0.1.81:6789/0} mounting by fuse: lv-test1 ~ # mount ceph-fuse on /uploads type fuse.ceph-fuse (rw,nosuid,nodev,allow_other,default_permissions) simulating write: lv-test-1 ~ # cp -r /usr/src/linux-3.2.2-hardened-r1/ /uploads/ killing one node: lv-test-2 ~ # killall ceph-mon ceph-mds ceph-osd Feb 24 13:11:17 lv-test-2 mon.lv-test-2[3474]: *** Caught signal (Terminated) ** Feb 24 13:11:17 lv-test-2 in thread 3195ce76760. Shutting down. Feb 24 13:11:17 lv-test-2 mds.lv-test-2[3553]: *** Caught signal (Terminated) ** Feb 24 13:11:17 lv-test-2 in thread 2ee100bb760. Shutting down. Feb 24 13:11:17 lv-test-2 osd.2[3654]: *** Caught signal (Terminated) ** Feb 24 13:11:17 lv-test-2 in thread 28f75487760. Shutting down. Feb 24 13:11:35 lv-test-2 client.admin[3885]: 3a6a385b700 monclient: hunting for new mon Feb 24 13:11:35 lv-test-2 client.admin[3885]: 3a6a385b700 client.5017 ms_handle_reset on 10.0.1.246:6789/0 Feb 24 13:11:17 lv-test-1 mon.lv-test-1[3751]: 2d62330a700 -- 10.0.1.246:6789/0 >> 10.0.1.247:6789/0 pipe(0x522b9ba080 sd=9 pgs=37 cs=1 l=0 ).fault with nothing to send, going to standby Feb 24 13:11:17 lv-test-1 mds.lv-test-1[3830]: 2e3b9bd0700 -- 10.0.1.246:6800/3829 >> 10.0.1.247:6800/3552 pipe(0x5faea44c0 sd=13 pgs=13 cs =1 l=0).fault with nothing to send, going to standby Feb 24 13:11:17 lv-test-1 client.admin[3151]: 2a9fbe55700 -- 10.0.1.246:0/3151 >> 10.0.1.247:6800/3552 pipe(0x2a9ec00f560 sd=0 pgs=9 cs=3 l =0).fault with nothing to send, going to standby Feb 24 13:11:17 lv-test-1 osd.1[3930]: 2a51b9f1700 osd.1 64 OSD::ms_handle_reset() Feb 24 13:11:17 lv-test-1 osd.1[3930]: 2a51b9f1700 osd.1 64 OSD::ms_handle_reset() s=0x22b4580700 Feb 24 13:11:17 lv-test-1 client.admin[3151]: 2a9fd95b700 client.4617 ms_handle_reset on 10.0.1.247:6801/3653 Feb 24 13:11:17 lv-test-1 osd.1[3930]: 2a510df7700 -- 10.0.1.246:6803/3929 >> 10.0.1.247:0/3654 pipe(0x2a50c005000 sd=24 pgs=4 cs=1 l=0).fa ult with nothing to send, going to standby Feb 24 13:11:18 lv-test-1 osd.1[3930]: 2a5184e6700 -- 10.0.1.246:6802/3929 >> 10.0.1.247:6802/3653 pipe(0x2a5145e9c50 sd=19 pgs=3 cs=1 l=0) .fault with nothing to send, going to standby Feb 24 13:11:18 lv-test-1 mds.lv-test-1[3830]: 2e3bcadc700 mds.1.5 ms_handle_reset on 10.0.1.247:6801/3653 Feb 24 13:11:18 lv-test-1 osd.1[3930]: 2a5183e5700 -- 10.0.1.246:0/3930 >> 10.0.1.247:6803/3653 pipe(0x2a5145eaeb0 sd=20 pgs=16 cs=1 l=0).f ault with nothing to send, going to standby Feb 24 13:11:34 lv-test-1 osd.1[3930]: 2a5229ff700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:14.366355) Feb 24 13:11:34 lv-test-1 osd.1[3930]: 2a5117fa700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:14.826382) Feb 24 13:11:35 lv-test-1 osd.1[3930]: 2a5229ff700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:15.369660) Feb 24 13:11:35 lv-test-1 osd.1[3930]: 2a51b9f1700 monclient: hunting for new mon Feb 24 13:11:35 lv-test-1 osd.1[3930]: 2a51b9f1700 osd.1 64 OSD::ms_handle_reset() Feb 24 13:11:36 lv-test-1 osd.1[3930]: 2a5117fa700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:16.129635) Feb 24 13:11:36 lv-test-1 osd.1[3930]: 2a5229ff700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:16.372900) copy hangs (cannot be killed by kill -9), /uploads is not accessible lv-test-1 ~ # time ceph -s ^C*** Caught signal (Interrupt) ** in thread 2da010af760. Shutting down. real 3m16.481s user 0m0.037s sys 0m0.013s lv-test-2 ~ # time ceph -s ^C*** Caught signal (Interrupt) ** in thread 314b193c760. Shutting down. real 0m35.401s user 0m0.017s sys 0m0.007s so cluster hanged and not responding anymore lets bring up back killed node: lv-test-2 ~ # /etc/init.d/ceph restart lv-test-2 ~ # ceph -s 2012-02-24 13:20:01.996366 pg v734: 594 pgs: 594 active+clean; 120 MB data, 683 MB used, 35448 MB / 37967 MB avail 2012-02-24 13:20:01.999207 mds e177: 3/3/3 up {0=shark1=up:active,1=lv-test-1=up:active,2=lv-test-2=up:active} 2012-02-24 13:20:01.999268 osd e64: 3 osds: 3 up, 3 in 2012-02-24 13:20:01.999368 log 2012-02-24 13:11:02.267947 osd.1 10.0.1.246:6801/3929 41 : [INF] 2.89 scrub ok 2012-02-24 13:20:01.999612 mon e1: 3 mons at {lv-test-1=10.0.1.246:6789/0,lv-test-2=10.0.1.247:6789/0,shark1=10.0.1.81:6789/0} lv-test-2 ~ # ceph -s 2012-02-24 13:20:44.984214 pg v742: 594 pgs: 594 active+clean; 144 MB data, 714 MB used, 35417 MB / 37967 MB avail 2012-02-24 13:20:44.986505 mds e182: 3/3/3 up {0=lv-test-2=up:resolve,1=lv-test-1=up:active,2=lv-test-2=up:active(laggy or crashed)} 2012-02-24 13:20:44.986697 osd e68: 3 osds: 1 up, 3 in 2012-02-24 13:20:44.986918 log 2012-02-24 13:20:42.606730 mon.1 10.0.1.246:6789/0 27 : [INF] mds.1 10.0.1.246:6800/3829 up:active 2012-02-24 13:20:44.987118 mon e1: 3 mons at {lv-test-1=10.0.1.246:6789/0,lv-test-2=10.0.1.247:6789/0,shark1=10.0.1.81:6789/0} Feb 24 13:23:28 lv-test-2 mds.lv-test-2[4608]: 2a93085f700 -- 10.0.1.247:6800/4607 >> 10.0.1.247:6800/3552 pipe(0x19d6f37b40 sd=12 pgs=0 cs=0 l=0).connect claims to be 10.0.1.247:6800/4607 not 10.0.1.247:6800/3552 - wrong node! Feb 24 13:24:13 lv-test-1 client.admin[3151]: 2a9fc158700 -- 10.0.1.246:0/3151 >> 10.0.1.247:6800/3552 pipe(0x2a9ec00f560 sd=0 pgs=9 cs=4 l=0).connect claims to be 10.0.1.247:6800/4607 not 10.0.1.247:6800/3552 - wrong node! Feb 24 13:24:14 lv-test-1 mds.lv-test-1[3830]: 2e3b981b700 -- 10.0.1.246:6800/3829 >> 10.0.1.247:6800/3552 pipe(0x5faea44c0 sd=8 pgs=13 cs=2 l=0).connect claims to be 10.0.1.247:6800/4607 not 10.0.1.247:6800/3552 - wrong node! lv-test-1 ~ # ceph -s 2012-02-24 13:24:36.558927 pg v762: 594 pgs: 594 active+clean; 144 MB data, 741 MB used, 35390 MB / 37967 MB avail 2012-02-24 13:24:36.560927 mds e195: 3/3/3 up {0=lv-test-2=up:resolve,1=lv-test-1=up:active,2=lv-test-2=up:active(laggy or crashed)} 2012-02-24 13:24:36.560988 osd e70: 3 osds: 2 up, 3 in 2012-02-24 13:24:36.561092 log 2012-02-24 13:24:29.691540 osd.2 10.0.1.247:6801/4706 17 : [INF] 0.77 scrub ok 2012-02-24 13:24:36.561201 mon e1: 3 mons at {lv-test-1=10.0.1.246:6789/0,lv-test-2=10.0.1.247:6789/0,shark1=10.0.1.81:6789/0} mount point still inaccessible. Thats all, that sucks is there proven scenario to build cluster of 3 nodes with replication that tolerates shutdown of two nodes without lockups of read/write process ? Цитирование "Tommi Virtanen" <tommi.virtanen@xxxxxxxxxxxxx>: > On Thu, Feb 23, 2012 at 11:07, Gregory Farnum > <gregory.farnum@xxxxxxxxxxxxx> wrote: >>>> 3 nodes, each running mon, mds & osd with replication level 3 for data & met >>>>adata pools. > ... >> Actually the OSDs will happily (well, not happily; the will complain. >> But they will run) run in degraded mode. However, if you have 3 active >> MDSes and you kill one of them without a standby available, you will >> lose access to part of your tree. That's probably what happened >> here... > > So let's try that angle. Slim, can you share the output of "ceph -s" with us >? -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html