I am trying to setup a small VM ceph cluster to excersise before creating a real cluster. Currently there are two osd's on the same host. I wanted to create an erasure coded pool with k=1 and m=1 (yes I know it's stupid, but it is a test case). On top of it there is a cache tier (writeback) and I used the pool to make a rados blockdevice with it. But as I wanted to format it with ext4 the system suddenly hangs. At the moment I do not understand why. I discovered that after the creation of the 'cold-storage' the active primaries are setup correctly (about one half of the pgs to osd.0 and the other half to osd.1). But the second osd in the active group is always nonsense (MAXINT, a placeholder for 'not there'?). To my suprise the state is 'active+clean' - how can this be, shouldn't it be 'active+degraded'? These are the commands I used (from my recollection) :# ceph osd erasure-code-profile get ec_1_1 > directory=/usr/lib/x86_64-linux-gnu/ceph/erasure-code > k=1 > m=1 > plugin=jerasure > ruleset-failure-domain=osd > technique=reed_sol_van :# ceph osd pool create liverpool 300 300 erasure ec_1_1 :# ceph osd pool create cache 100 100 replicated :# ceph osd tier add liverpool cache :# ceph osd tier cache-mode writeback :# ceph osd tier set-overlay liverpool cache :# rbd --pool liverpool create --size 1500 testdisk :# rbd --pool liverpool map testdisk :# mkfs.ext4 /dev/rbd/liverpool/testdisk Now the mkfs freezes and I can see this thru ceph -w: 2014-12-17 19:08:56.466846 mon.0 [INF] pgmap v2062: 400 pgs: 400 active+clean; 140 bytes data, 88220 kB used, 2418 MB / 2504 MB avail; 47 B/s rd, 0 op/s 2014-12-17 19:11:20.697190 mon.0 [INF] pgmap v2064: 400 pgs: 307 stale+active+clean, 93 active+clean; 140 bytes data, 106 MB used, 2397 MB / 2504 MB avail 2014-12-17 19:11:20.388468 osd.1 [WRN] 6 slow requests, 6 included below; oldest blocked for > 124.270960 secs 2014-12-17 19:11:20.388556 osd.1 [WRN] slow request 124.270960 seconds old, received at 2014-12-17 19:09:16.116251: osd_op(client.6155.1:508 rb.0.1807.2ae8944a.000000000005 [set-alloc-hint object_size 4194304 write_size 4194304,write 4091904~24576] 24.e6ca00e6 ondisk+write e590) v4 currently waiting for subops from 0 [repeated a few times] 2014-12-17 19:11:21.911696 mon.0 [INF] osdmap e592: 2 osds: 1 up, 2 in 2014-12-17 19:11:22.053272 mon.0 [INF] pgmap v2065: 400 pgs: 307 stale+active+clean, 93 active+clean; 140 bytes data, 106 MB used, 2397 MB / 2504 MB avail 2014-12-17 19:11:24.826008 mon.0 [INF] osd.0 10.0.0.141:6800/7919 boot 2014-12-17 19:11:24.827218 mon.0 [INF] osdmap e593: 2 osds: 2 up, 2 in 2014-12-17 19:11:24.935173 mon.0 [INF] pgmap v2066: 400 pgs: 307 stale+active+clean, 93 active+clean; 140 bytes data, 106 MB used, 2397 MB / 2504 MB avail 2014-12-17 19:11:26.072303 mon.0 [INF] osdmap e594: 2 osds: 2 up, 2 in 2014-12-17 19:11:26.220102 mon.0 [INF] pgmap v2067: 400 pgs: 307 stale+active+clean, 93 active+clean; 140 bytes data, 106 MB used, 2397 MB / 2504 MB avail 2014-12-17 19:11:30.702281 mon.0 [INF] pgmap v2068: 400 pgs: 307 stale+active+clean, 93 active+clean; 16308 kB data, 138 MB used, 2366 MB / 2504 MB avail; 1471 kB/s wr, 7 op/s; 2184 kB/s, 0 objects/s recovering 2014-12-17 19:11:32.050330 mon.0 [INF] pgmap v2069: 400 pgs: 400 active+clean; 33924 kB data, 167 MB used, 2337 MB / 2504 MB avail; 4543 kB/s wr, 46 op/s; 3565 kB/s, 1 objects/s recovering 2014-12-17 19:13:30.569447 mon.0 [INF] pgmap v2070: 400 pgs: 400 active+clean; 33924 kB data, 143 MB used, 2361 MB / 2504 MB avail How is this explained? What have I done wrong? Greetings! _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com