I have: osd_objectstore = keyvaluestore-dev in the global section of my ceph.conf [root at ceph002 ~]# ceph osd erasure-code-profile get profile11 directory=/usr/lib64/ceph/erasure-code k=8 m=3 plugin=jerasure ruleset-failure-domain=osd technique=reed_sol_van the ecdata pool has this as profile pool 3 'ecdata' erasure size 11 min_size 8 crush_ruleset 2 object_hash rjenkins pg_num 128 pgp_num 128 last_change 161 flags hashpspool stripe_width 4096 ECrule in crushmap rule ecdata { ruleset 2 type erasure min_size 3 max_size 20 step set_chooseleaf_tries 5 step take default-ec step choose indep 0 type osd step emit } root default-ec { id -8 # do not change unnecessarily # weight 140.616 alg straw hash 0 # rjenkins1 item ceph001-ec weight 46.872 item ceph002-ec weight 46.872 item ceph003-ec weight 46.872 ... Cheers! Kenneth ----- Message from Haomai Wang <haomaiwang at gmail.com> --------- Date: Thu, 14 Aug 2014 10:07:50 +0800 From: Haomai Wang <haomaiwang at gmail.com> Subject: Re: ceph cluster inconsistency? To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be> Cc: ceph-users <ceph-users at lists.ceph.com> > Hi Kenneth, > > Could you give your configuration related to EC and KeyValueStore? > Not sure whether it's bug on KeyValueStore > > On Thu, Aug 14, 2014 at 12:06 AM, Kenneth Waegeman > <Kenneth.Waegeman at ugent.be> wrote: >> Hi, >> >> I was doing some tests with rados bench on a Erasure Coded pool (using >> keyvaluestore-dev objectstore) on 0.83, and I see some strangs things: >> >> >> [root at ceph001 ~]# ceph status >> cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d >> health HEALTH_WARN too few pgs per osd (4 < min 20) >> monmap e1: 3 mons at >> {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0}, >> election epoch 6, quorum 0,1,2 ceph001,ceph002,ceph003 >> mdsmap e116: 1/1/1 up {0=ceph001.cubone.os=up:active}, 2 up:standby >> osdmap e292: 78 osds: 78 up, 78 in >> pgmap v48873: 320 pgs, 4 pools, 15366 GB data, 3841 kobjects >> 1381 GB used, 129 TB / 131 TB avail >> 320 active+clean >> >> There is around 15T of data, but only 1.3 T usage. >> >> This is also visible in rados: >> >> [root at ceph001 ~]# rados df >> pool name category KB objects clones >> degraded unfound rd rd KB wr wr KB >> data - 0 0 0 >> 0 0 0 0 0 0 >> ecdata - 16113451009 3933959 0 >> 0 0 1 1 3935632 16116850711 >> metadata - 2 20 0 >> 0 0 33 36 21 8 >> rbd - 0 0 0 >> 0 0 0 0 0 0 >> total used 1448266016 3933979 >> total avail 139400181016 >> total space 140848447032 >> >> >> Another (related?) thing: if I do rados -p ecdata ls, I trigger osd >> shutdowns (each time): >> I get a list followed by an error: >> >> ... >> benchmark_data_ceph001.cubone.os_8961_object243839 >> benchmark_data_ceph001.cubone.os_5560_object801983 >> benchmark_data_ceph001.cubone.os_31461_object856489 >> benchmark_data_ceph001.cubone.os_8961_object202232 >> benchmark_data_ceph001.cubone.os_4919_object33199 >> benchmark_data_ceph001.cubone.os_5560_object807797 >> benchmark_data_ceph001.cubone.os_4919_object74729 >> benchmark_data_ceph001.cubone.os_31461_object1264121 >> benchmark_data_ceph001.cubone.os_5560_object1318513 >> benchmark_data_ceph001.cubone.os_5560_object1202111 >> benchmark_data_ceph001.cubone.os_31461_object939107 >> benchmark_data_ceph001.cubone.os_31461_object729682 >> benchmark_data_ceph001.cubone.os_5560_object122915 >> benchmark_data_ceph001.cubone.os_5560_object76521 >> benchmark_data_ceph001.cubone.os_5560_object113261 >> benchmark_data_ceph001.cubone.os_31461_object575079 >> benchmark_data_ceph001.cubone.os_5560_object671042 >> benchmark_data_ceph001.cubone.os_5560_object381146 >> 2014-08-13 17:57:48.736150 7f65047b5700 0 -- 10.141.8.180:0/1023295 >> >> 10.141.8.182:6839/4471 pipe(0x7f64fc019b20 sd=5 :0 s=1 pgs=0 cs=0 l=1 >> c=0x7f64fc019db0).fault >> >> And I can see this in the log files: >> >> -25> 2014-08-13 17:52:56.323908 7f8a97fa4700 1 -- >> 10.143.8.182:6827/64670 <== osd.57 10.141.8.182:0/15796 51 ==== >> osd_ping(ping e220 stamp 2014-08-13 17:52:56.323092) v2 ==== 47+0+0 >> (3227325175 0 0) 0xf475940 con 0xee89fa0 >> -24> 2014-08-13 17:52:56.323938 7f8a97fa4700 1 -- >> 10.143.8.182:6827/64670 --> 10.141.8.182:0/15796 -- osd_ping(ping_reply e220 >> stamp 2014-08-13 17:52:56.323092) v2 -- ?+0 0xf815b00 con 0xee89fa0 >> -23> 2014-08-13 17:52:56.324078 7f8a997a7700 1 -- >> 10.141.8.182:6840/64670 <== osd.57 10.141.8.182:0/15796 51 ==== >> osd_ping(ping e220 stamp 2014-08-13 17:52:56.323092) v2 ==== 47+0+0 >> (3227325175 0 0) 0xf132bc0 con 0xee8a680 >> -22> 2014-08-13 17:52:56.324111 7f8a997a7700 1 -- >> 10.141.8.182:6840/64670 --> 10.141.8.182:0/15796 -- osd_ping(ping_reply e220 >> stamp 2014-08-13 17:52:56.323092) v2 -- ?+0 0xf811a40 con 0xee8a680 >> -21> 2014-08-13 17:52:56.584461 7f8a997a7700 1 -- >> 10.141.8.182:6840/64670 <== osd.29 10.143.8.181:0/12142 47 ==== >> osd_ping(ping e220 stamp 2014-08-13 17:52:56.583010) v2 ==== 47+0+0 >> (3355887204 0 0) 0xf655940 con 0xee88b00 >> -20> 2014-08-13 17:52:56.584486 7f8a997a7700 1 -- >> 10.141.8.182:6840/64670 --> 10.143.8.181:0/12142 -- osd_ping(ping_reply e220 >> stamp 2014-08-13 17:52:56.583010) v2 -- ?+0 0xf132bc0 con 0xee88b00 >> -19> 2014-08-13 17:52:56.584498 7f8a97fa4700 1 -- >> 10.143.8.182:6827/64670 <== osd.29 10.143.8.181:0/12142 47 ==== >> osd_ping(ping e220 stamp 2014-08-13 17:52:56.583010) v2 ==== 47+0+0 >> (3355887204 0 0) 0xf20e040 con 0xee886e0 >> -18> 2014-08-13 17:52:56.584526 7f8a97fa4700 1 -- >> 10.143.8.182:6827/64670 --> 10.143.8.181:0/12142 -- osd_ping(ping_reply e220 >> stamp 2014-08-13 17:52:56.583010) v2 -- ?+0 0xf475940 con 0xee886e0 >> -17> 2014-08-13 17:52:56.594448 7f8a798c7700 1 -- >> 10.141.8.182:6839/64670 >> :/0 pipe(0xec15f00 sd=74 :6839 s=0 pgs=0 cs=0 l=0 >> c=0xee856a0).accept sd=74 10.141.8.180:47641/0 >> -16> 2014-08-13 17:52:56.594921 7f8a798c7700 1 -- >> 10.141.8.182:6839/64670 <== client.7512 10.141.8.180:0/1018433 1 ==== >> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >> ack+read+known_if_redirected e220) v4 ==== 151+0+39 (1972163119 0 >> 4174233976) 0xf3bca40 con 0xee856a0 >> -15> 2014-08-13 17:52:56.594957 7f8a798c7700 5 -- op tracker -- , seq: >> 299, time: 2014-08-13 17:52:56.594874, event: header_read, op: >> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >> ack+read+known_if_redirected e220) >> -14> 2014-08-13 17:52:56.594970 7f8a798c7700 5 -- op tracker -- , seq: >> 299, time: 2014-08-13 17:52:56.594880, event: throttled, op: >> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >> ack+read+known_if_redirected e220) >> -13> 2014-08-13 17:52:56.594978 7f8a798c7700 5 -- op tracker -- , seq: >> 299, time: 2014-08-13 17:52:56.594917, event: all_read, op: >> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >> ack+read+known_if_redirected e220) >> -12> 2014-08-13 17:52:56.594986 7f8a798c7700 5 -- op tracker -- , seq: >> 299, time: 0.000000, event: dispatched, op: osd_op(client.7512.0:1 [pgls >> start_epoch 0] 3.0 ack+read+known_if_redirected e220) >> -11> 2014-08-13 17:52:56.595127 7f8a90795700 5 -- op tracker -- , seq: >> 299, time: 2014-08-13 17:52:56.595104, event: reached_pg, op: >> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >> ack+read+known_if_redirected e220) >> -10> 2014-08-13 17:52:56.595159 7f8a90795700 5 -- op tracker -- , seq: >> 299, time: 2014-08-13 17:52:56.595153, event: started, op: >> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >> ack+read+known_if_redirected e220) >> -9> 2014-08-13 17:52:56.602179 7f8a90795700 1 -- >> 10.141.8.182:6839/64670 --> 10.141.8.180:0/1018433 -- osd_op_reply(1 [pgls >> start_epoch 0] v164'30654 uv30654 ondisk = 0) v6 -- ?+0 0xec16180 con >> 0xee856a0 >> -8> 2014-08-13 17:52:56.602211 7f8a90795700 5 -- op tracker -- , seq: >> 299, time: 2014-08-13 17:52:56.602205, event: done, op: >> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >> ack+read+known_if_redirected e220) >> -7> 2014-08-13 17:52:56.614839 7f8a798c7700 1 -- >> 10.141.8.182:6839/64670 <== client.7512 10.141.8.180:0/1018433 2 ==== >> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >> ack+read+known_if_redirected e220) v4 ==== 151+0+89 (3460833343 0 >> 2600845095) 0xf3bcec0 con 0xee856a0 >> -6> 2014-08-13 17:52:56.614864 7f8a798c7700 5 -- op tracker -- , seq: >> 300, time: 2014-08-13 17:52:56.614789, event: header_read, op: >> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >> ack+read+known_if_redirected e220) >> -5> 2014-08-13 17:52:56.614874 7f8a798c7700 5 -- op tracker -- , seq: >> 300, time: 2014-08-13 17:52:56.614792, event: throttled, op: >> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >> ack+read+known_if_redirected e220) >> -4> 2014-08-13 17:52:56.614884 7f8a798c7700 5 -- op tracker -- , seq: >> 300, time: 2014-08-13 17:52:56.614835, event: all_read, op: >> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >> ack+read+known_if_redirected e220) >> -3> 2014-08-13 17:52:56.614891 7f8a798c7700 5 -- op tracker -- , seq: >> 300, time: 0.000000, event: dispatched, op: osd_op(client.7512.0:2 [pgls >> start_epoch 220] 3.0 ack+read+known_if_redirected e220) >> -2> 2014-08-13 17:52:56.614972 7f8a92f9a700 5 -- op tracker -- , seq: >> 300, time: 2014-08-13 17:52:56.614958, event: reached_pg, op: >> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >> ack+read+known_if_redirected e220) >> -1> 2014-08-13 17:52:56.614993 7f8a92f9a700 5 -- op tracker -- , seq: >> 300, time: 2014-08-13 17:52:56.614986, event: started, op: >> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >> ack+read+known_if_redirected e220) >> 0> 2014-08-13 17:52:56.617087 7f8a92f9a700 -1 os/GenericObjectMap.cc: >> In function 'int GenericObjectMap::list_objects(const coll_t&, ghobject_t, >> int, std::vector<ghobject_t>*, ghobject_t*)' thread 7f8a92f9a700 time >> 2014-08-13 17:52:56.615073 >> os/GenericObjectMap.cc: 1118: FAILED assert(start <= header.oid) >> >> >> ceph version 0.83 (78ff1f0a5dfd3c5850805b4021738564c36c92b8) >> 1: (GenericObjectMap::list_objects(coll_t const&, ghobject_t, int, >> std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x474) >> [0x98f774] >> 2: (KeyValueStore::collection_list_partial(coll_t, ghobject_t, int, int, >> snapid_t, std::vector<ghobject_t, std::allocator<ghobject_t> >*, >> ghobject_t*)+0x274) [0x8c5b54] >> 3: (PGBackend::objects_list_partial(hobject_t const&, int, int, snapid_t, >> std::vector<hobject_t, std::allocator<hobject_t> >*, hobject_t*)+0x1c9) >> [0x862de9] >> 4: (ReplicatedPG::do_pg_op(std::tr1::shared_ptr<OpRequest>)+0xea5) >> [0x7f67f5] >> 5: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x1f3) [0x8177b3] >> 6: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, >> ThreadPool::TPHandle&)+0x5d5) [0x7b8045] >> 7: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >> std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x47d) [0x62bf8d] >> 8: (OSD::ShardedOpWQ::_process(unsigned int, >> ceph::heartbeat_handle_d*)+0x35c) [0x62c56c] >> 9: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8cd) >> [0xa776fd] >> 10: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xa79980] >> 11: (()+0x7df3) [0x7f8aac71fdf3] >> 12: (clone()+0x6d) [0x7f8aab1963dd] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to >> interpret this. >> >> >> ceph version 0.83 (78ff1f0a5dfd3c5850805b4021738564c36c92b8) >> 1: /usr/bin/ceph-osd() [0x99b466] >> 2: (()+0xf130) [0x7f8aac727130] >> 3: (gsignal()+0x39) [0x7f8aab0d5989] >> 4: (abort()+0x148) [0x7f8aab0d7098] >> 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f8aab9e89d5] >> 6: (()+0x5e946) [0x7f8aab9e6946] >> 7: (()+0x5e973) [0x7f8aab9e6973] >> 8: (()+0x5eb9f) [0x7f8aab9e6b9f] >> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x1ef) [0xa8805f] >> 10: (GenericObjectMap::list_objects(coll_t const&, ghobject_t, int, >> std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x474) >> [0x98f774] >> 11: (KeyValueStore::collection_list_partial(coll_t, ghobject_t, int, int, >> snapid_t, std::vector<ghobject_t, std::allocator<ghobject_t> >*, >> ghobject_t*)+0x274) [0x8c5b54] >> 12: (PGBackend::objects_list_partial(hobject_t const&, int, int, snapid_t, >> std::vector<hobject_t, std::allocator<hobject_t> >*, hobject_t*)+0x1c9) >> [0x862de9] >> 13: (ReplicatedPG::do_pg_op(std::tr1::shared_ptr<OpRequest>)+0xea5) >> [0x7f67f5] >> 14: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x1f3) [0x8177b3] >> 15: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, >> ThreadPool::TPHandle&)+0x5d5) [0x7b8045] >> 16: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >> std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x47d) [0x62bf8d] >> 17: (OSD::ShardedOpWQ::_process(unsigned int, >> ceph::heartbeat_handle_d*)+0x35c) [0x62c56c] >> 18: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8cd) >> [0xa776fd] >> 19: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xa79980] >> 20: (()+0x7df3) [0x7f8aac71fdf3] >> 21: (clone()+0x6d) [0x7f8aab1963dd] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to >> interpret this. >> >> --- begin dump of recent events --- >> 0> 2014-08-13 17:52:56.714214 7f8a92f9a700 -1 *** Caught signal >> (Aborted) ** >> in thread 7f8a92f9a700 >> >> ceph version 0.83 (78ff1f0a5dfd3c5850805b4021738564c36c92b8) >> 1: /usr/bin/ceph-osd() [0x99b466] >> 2: (()+0xf130) [0x7f8aac727130] >> 3: (gsignal()+0x39) [0x7f8aab0d5989] >> 4: (abort()+0x148) [0x7f8aab0d7098] >> 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f8aab9e89d5] >> 6: (()+0x5e946) [0x7f8aab9e6946] >> 7: (()+0x5e973) [0x7f8aab9e6973] >> 8: (()+0x5eb9f) [0x7f8aab9e6b9f] >> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x1ef) [0xa8805f] >> 10: (GenericObjectMap::list_objects(coll_t const&, ghobject_t, int, >> std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x474) >> [0x98f774] >> 11: (KeyValueStore::collection_list_partial(coll_t, ghobject_t, int, int, >> snapid_t, std::vector<ghobject_t, std::allocator<ghobject_t> >*, >> ghobject_t*)+0x274) [0x8c5b54] >> 12: (PGBackend::objects_list_partial(hobject_t const&, int, int, snapid_t, >> std::vector<hobject_t, std::allocator<hobject_t> >*, hobject_t*)+0x1c9) >> [0x862de9] >> 13: (ReplicatedPG::do_pg_op(std::tr1::shared_ptr<OpRequest>)+0xea5) >> [0x7f67f5] >> 14: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x1f3) [0x8177b3] >> 15: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, >> ThreadPool::TPHandle&)+0x5d5) [0x7b8045] >> 16: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >> std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x47d) [0x62bf8d] >> 17: (OSD::ShardedOpWQ::_process(unsigned int, >> ceph::heartbeat_handle_d*)+0x35c) [0x62c56c] >> 18: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8cd) >> [0xa776fd] >> 19: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xa79980] >> 20: (()+0x7df3) [0x7f8aac71fdf3] >> 21: (clone()+0x6d) [0x7f8aab1963dd] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to >> interpret this. >> >> I guess this has something to do with using the dev Keyvaluestore? >> >> >> Thanks! >> >> Kenneth >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users at lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- > Best Regards, > > Wheat ----- End message from Haomai Wang <haomaiwang at gmail.com> ----- -- Met vriendelijke groeten, Kenneth Waegeman