On Mon, Aug 18, 2014 at 5:38 PM, Kenneth Waegeman <Kenneth.Waegeman at ugent.be> wrote: > Hi, > > I tried this after restarting the osd, but I guess that was not the aim > ( > # ceph-kvstore-tool /var/lib/ceph/osd/ceph-67/current/ list _GHOBJTOSEQ_| > grep 6adb1100 -A 100 > IO error: lock /var/lib/ceph/osd/ceph-67/current//LOCK: Resource temporarily > unavailable > tools/ceph_kvstore_tool.cc: In function 'StoreTool::StoreTool(const > string&)' thread 7f8fecf7d780 time 2014-08-18 11:12:29.551780 > tools/ceph_kvstore_tool.cc: 38: FAILED assert(!db_ptr->open(std::cerr)) > .. > ) > > When I run it after bringing the osd down, it takes a while, but it has no > output.. (When running it without the grep, I'm getting a huge list ) Oh, sorry for it! I made a mistake, the hash value(6adb1100) will be reversed into leveldb. So grep "benchmark_data_ceph001.cubone.os_5560_object789734" should be help it. > > Or should I run this immediately after the osd is crashed, (because it maybe > rebalanced? I did already restarted the cluster) > > > I don't know if it is related, but before I could all do that, I had to fix > something else: A monitor did run out if disk space, using 8GB for his > store.db folder (lot of sst files). Other monitors are also near that level. > Never had that problem on previous setups before. I recreated a monitor and > now it uses 3.8GB. It exists some duplicate data which needed to be compacted. > Another idea, maybe you can make KeyValueStore's stripe size align with EC stripe size. I haven't think deeply and maybe I will try it later. > Thanks! > > Kenneth > > > > ----- Message from Sage Weil <sweil at redhat.com> --------- > Date: Fri, 15 Aug 2014 06:10:34 -0700 (PDT) > From: Sage Weil <sweil at redhat.com> > > Subject: Re: [ceph-users] ceph cluster inconsistency? > To: Haomai Wang <haomaiwang at gmail.com> > Cc: Kenneth Waegeman <Kenneth.Waegeman at ugent.be>, > ceph-users at lists.ceph.com > > > >> On Fri, 15 Aug 2014, Haomai Wang wrote: >>> >>> Hi Kenneth, >>> >>> I don't find valuable info in your logs, it lack of the necessary >>> debug output when accessing crash code. >>> >>> But I scan the encode/decode implementation in GenericObjectMap and >>> find something bad. >>> >>> For example, two oid has same hash and their name is: >>> A: "rb.data.123" >>> B: "rb-123" >>> >>> In ghobject_t compare level, A < B. But GenericObjectMap encode "." to >>> "%e", so the key in DB is: >>> A: _GHOBJTOSEQ_:blah!51615000!!none!!rb%edata%e123!head >>> B: _GHOBJTOSEQ_:blah!51615000!!none!!rb-123!head >>> >>> A > B >>> >>> And it seemed that the escape function is useless and should be disabled. >>> >>> I'm not sure whether Kenneth's problem is touching this bug. Because >>> this scene only occur when the object set is very large and make the >>> two object has same hash value. >>> >>> Kenneth, could you have time to run "ceph-kv-store [path-to-osd] list >>> _GHOBJTOSEQ_| grep 6adb1100 -A 100". ceph-kv-store is a debug tool >>> which can be compiled from source. You can clone ceph repo and run >>> "./authongen.sh; ./configure; cd src; make ceph-kvstore-tool". >>> "path-to-osd" should be "/var/lib/ceph/osd-[id]/current/". "6adb1100" >>> is from your verbose log and the next 100 rows should know necessary >>> infos. >> >> >> You can also get ceph-kvstore-tool from the 'ceph-tests' package. >> >>> Hi sage, do you think we need to provided with upgrade function to fix >>> it? >> >> >> Hmm, we might. This only affects the key/value encoding right? The >> FileStore is using its own function to map these to file names? >> >> Can you open a ticket in the tracker for this? >> >> Thanks! >> sage >> >>> >>> >>> On Thu, Aug 14, 2014 at 7:36 PM, Kenneth Waegeman >>> <Kenneth.Waegeman at ugent.be> wrote: >>> >>> > >>> > ----- Message from Haomai Wang <haomaiwang at gmail.com> --------- >>> > Date: Thu, 14 Aug 2014 19:11:55 +0800 >>> > >>> > From: Haomai Wang <haomaiwang at gmail.com> >>> > Subject: Re: [ceph-users] ceph cluster inconsistency? >>> > To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be> >>> > >>> > >>> >> Could you add config "debug_keyvaluestore = 20/20" to the crashed osd >>> >> and replay the command causing crash? >>> >> >>> >> I would like to get more debug infos! Thanks. >>> > >>> > >>> > I included the log in attachment! >>> > Thanks! >>> > >>> >> >>> >> On Thu, Aug 14, 2014 at 4:41 PM, Kenneth Waegeman >>> >> <Kenneth.Waegeman at ugent.be> wrote: >>> >>> >>> >>> >>> >>> I have: >>> >>> osd_objectstore = keyvaluestore-dev >>> >>> >>> >>> in the global section of my ceph.conf >>> >>> >>> >>> >>> >>> [root at ceph002 ~]# ceph osd erasure-code-profile get profile11 >>> >>> directory=/usr/lib64/ceph/erasure-code >>> >>> k=8 >>> >>> m=3 >>> >>> plugin=jerasure >>> >>> ruleset-failure-domain=osd >>> >>> technique=reed_sol_van >>> >>> >>> >>> the ecdata pool has this as profile >>> >>> >>> >>> pool 3 'ecdata' erasure size 11 min_size 8 crush_ruleset 2 >>> >>> object_hash >>> >>> rjenkins pg_num 128 pgp_num 128 last_change 161 flags hashpspool >>> >>> stripe_width 4096 >>> >>> >>> >>> ECrule in crushmap >>> >>> >>> >>> rule ecdata { >>> >>> ruleset 2 >>> >>> type erasure >>> >>> min_size 3 >>> >>> max_size 20 >>> >>> step set_chooseleaf_tries 5 >>> >>> step take default-ec >>> >>> step choose indep 0 type osd >>> >>> step emit >>> >>> } >>> >>> root default-ec { >>> >>> id -8 # do not change unnecessarily >>> >>> # weight 140.616 >>> >>> alg straw >>> >>> hash 0 # rjenkins1 >>> >>> item ceph001-ec weight 46.872 >>> >>> item ceph002-ec weight 46.872 >>> >>> item ceph003-ec weight 46.872 >>> >>> ... >>> >>> >>> >>> Cheers! >>> >>> Kenneth >>> >>> >>> >>> ----- Message from Haomai Wang <haomaiwang at gmail.com> --------- >>> >>> Date: Thu, 14 Aug 2014 10:07:50 +0800 >>> >>> From: Haomai Wang <haomaiwang at gmail.com> >>> >>> Subject: Re: [ceph-users] ceph cluster inconsistency? >>> >>> To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be> >>> >>> Cc: ceph-users <ceph-users at lists.ceph.com> >>> >>> >>> >>> >>> >>> >>> >>>> Hi Kenneth, >>> >>>> >>> >>>> Could you give your configuration related to EC and KeyValueStore? >>> >>>> Not sure whether it's bug on KeyValueStore >>> >>>> >>> >>>> On Thu, Aug 14, 2014 at 12:06 AM, Kenneth Waegeman >>> >>>> <Kenneth.Waegeman at ugent.be> wrote: >>> >>>>> >>> >>>>> >>> >>>>> Hi, >>> >>>>> >>> >>>>> I was doing some tests with rados bench on a Erasure Coded pool >>> >>>>> (using >>> >>>>> keyvaluestore-dev objectstore) on 0.83, and I see some strangs >>> >>>>> things: >>> >>>>> >>> >>>>> >>> >>>>> [root at ceph001 ~]# ceph status >>> >>>>> cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d >>> >>>>> health HEALTH_WARN too few pgs per osd (4 < min 20) >>> >>>>> monmap e1: 3 mons at >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0}, >>> >>>>> election epoch 6, quorum 0,1,2 ceph001,ceph002,ceph003 >>> >>>>> mdsmap e116: 1/1/1 up {0=ceph001.cubone.os=up:active}, 2 >>> >>>>> up:standby >>> >>>>> osdmap e292: 78 osds: 78 up, 78 in >>> >>>>> pgmap v48873: 320 pgs, 4 pools, 15366 GB data, 3841 kobjects >>> >>>>> 1381 GB used, 129 TB / 131 TB avail >>> >>>>> 320 active+clean >>> >>>>> >>> >>>>> There is around 15T of data, but only 1.3 T usage. >>> >>>>> >>> >>>>> This is also visible in rados: >>> >>>>> >>> >>>>> [root at ceph001 ~]# rados df >>> >>>>> pool name category KB objects >>> >>>>> clones >>> >>>>> degraded unfound rd rd KB wr >>> >>>>> wr >>> >>>>> KB >>> >>>>> data - 0 0 >>> >>>>> 0 >>> >>>>> 0 0 0 0 0 0 >>> >>>>> ecdata - 16113451009 3933959 >>> >>>>> 0 >>> >>>>> 0 0 1 1 3935632 16116850711 >>> >>>>> metadata - 2 20 >>> >>>>> 0 >>> >>>>> 0 0 33 36 21 8 >>> >>>>> rbd - 0 0 >>> >>>>> 0 >>> >>>>> 0 0 0 0 0 0 >>> >>>>> total used 1448266016 3933979 >>> >>>>> total avail 139400181016 >>> >>>>> total space 140848447032 >>> >>>>> >>> >>>>> >>> >>>>> Another (related?) thing: if I do rados -p ecdata ls, I trigger osd >>> >>>>> shutdowns (each time): >>> >>>>> I get a list followed by an error: >>> >>>>> >>> >>>>> ... >>> >>>>> benchmark_data_ceph001.cubone.os_8961_object243839 >>> >>>>> benchmark_data_ceph001.cubone.os_5560_object801983 >>> >>>>> benchmark_data_ceph001.cubone.os_31461_object856489 >>> >>>>> benchmark_data_ceph001.cubone.os_8961_object202232 >>> >>>>> benchmark_data_ceph001.cubone.os_4919_object33199 >>> >>>>> benchmark_data_ceph001.cubone.os_5560_object807797 >>> >>>>> benchmark_data_ceph001.cubone.os_4919_object74729 >>> >>>>> benchmark_data_ceph001.cubone.os_31461_object1264121 >>> >>>>> benchmark_data_ceph001.cubone.os_5560_object1318513 >>> >>>>> benchmark_data_ceph001.cubone.os_5560_object1202111 >>> >>>>> benchmark_data_ceph001.cubone.os_31461_object939107 >>> >>>>> benchmark_data_ceph001.cubone.os_31461_object729682 >>> >>>>> benchmark_data_ceph001.cubone.os_5560_object122915 >>> >>>>> benchmark_data_ceph001.cubone.os_5560_object76521 >>> >>>>> benchmark_data_ceph001.cubone.os_5560_object113261 >>> >>>>> benchmark_data_ceph001.cubone.os_31461_object575079 >>> >>>>> benchmark_data_ceph001.cubone.os_5560_object671042 >>> >>>>> benchmark_data_ceph001.cubone.os_5560_object381146 >>> >>>>> 2014-08-13 17:57:48.736150 7f65047b5700 0 -- >>> >>>>> 10.141.8.180:0/1023295 >> >>> >>>>> 10.141.8.182:6839/4471 pipe(0x7f64fc019b20 sd=5 :0 s=1 pgs=0 cs=0 >>> >>>>> l=1 >>> >>>>> c=0x7f64fc019db0).fault >>> >>>>> >>> >>>>> And I can see this in the log files: >>> >>>>> >>> >>>>> -25> 2014-08-13 17:52:56.323908 7f8a97fa4700 1 -- >>> >>>>> 10.143.8.182:6827/64670 <== osd.57 10.141.8.182:0/15796 51 ==== >>> >>>>> osd_ping(ping e220 stamp 2014-08-13 17:52:56.323092) v2 ==== 47+0+0 >>> >>>>> (3227325175 0 0) 0xf475940 con 0xee89fa0 >>> >>>>> -24> 2014-08-13 17:52:56.323938 7f8a97fa4700 1 -- >>> >>>>> 10.143.8.182:6827/64670 --> 10.141.8.182:0/15796 -- >>> >>>>> osd_ping(ping_reply >>> >>>>> e220 >>> >>>>> stamp 2014-08-13 17:52:56.323092) v2 -- ?+0 0xf815b00 con 0xee89fa0 >>> >>>>> -23> 2014-08-13 17:52:56.324078 7f8a997a7700 1 -- >>> >>>>> 10.141.8.182:6840/64670 <== osd.57 10.141.8.182:0/15796 51 ==== >>> >>>>> osd_ping(ping e220 stamp 2014-08-13 17:52:56.323092) v2 ==== 47+0+0 >>> >>>>> (3227325175 0 0) 0xf132bc0 con 0xee8a680 >>> >>>>> -22> 2014-08-13 17:52:56.324111 7f8a997a7700 1 -- >>> >>>>> 10.141.8.182:6840/64670 --> 10.141.8.182:0/15796 -- >>> >>>>> osd_ping(ping_reply >>> >>>>> e220 >>> >>>>> stamp 2014-08-13 17:52:56.323092) v2 -- ?+0 0xf811a40 con 0xee8a680 >>> >>>>> -21> 2014-08-13 17:52:56.584461 7f8a997a7700 1 -- >>> >>>>> 10.141.8.182:6840/64670 <== osd.29 10.143.8.181:0/12142 47 ==== >>> >>>>> osd_ping(ping e220 stamp 2014-08-13 17:52:56.583010) v2 ==== 47+0+0 >>> >>>>> (3355887204 0 0) 0xf655940 con 0xee88b00 >>> >>>>> -20> 2014-08-13 17:52:56.584486 7f8a997a7700 1 -- >>> >>>>> 10.141.8.182:6840/64670 --> 10.143.8.181:0/12142 -- >>> >>>>> osd_ping(ping_reply >>> >>>>> e220 >>> >>>>> stamp 2014-08-13 17:52:56.583010) v2 -- ?+0 0xf132bc0 con 0xee88b00 >>> >>>>> -19> 2014-08-13 17:52:56.584498 7f8a97fa4700 1 -- >>> >>>>> 10.143.8.182:6827/64670 <== osd.29 10.143.8.181:0/12142 47 ==== >>> >>>>> osd_ping(ping e220 stamp 2014-08-13 17:52:56.583010) v2 ==== 47+0+0 >>> >>>>> (3355887204 0 0) 0xf20e040 con 0xee886e0 >>> >>>>> -18> 2014-08-13 17:52:56.584526 7f8a97fa4700 1 -- >>> >>>>> 10.143.8.182:6827/64670 --> 10.143.8.181:0/12142 -- >>> >>>>> osd_ping(ping_reply >>> >>>>> e220 >>> >>>>> stamp 2014-08-13 17:52:56.583010) v2 -- ?+0 0xf475940 con 0xee886e0 >>> >>>>> -17> 2014-08-13 17:52:56.594448 7f8a798c7700 1 -- >>> >>>>> 10.141.8.182:6839/64670 >> :/0 pipe(0xec15f00 sd=74 :6839 s=0 pgs=0 >>> >>>>> cs=0 >>> >>>>> l=0 >>> >>>>> c=0xee856a0).accept sd=74 10.141.8.180:47641/0 >>> >>>>> -16> 2014-08-13 17:52:56.594921 7f8a798c7700 1 -- >>> >>>>> 10.141.8.182:6839/64670 <== client.7512 10.141.8.180:0/1018433 1 >>> >>>>> ==== >>> >>>>> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >>> >>>>> ack+read+known_if_redirected e220) v4 ==== 151+0+39 (1972163119 0 >>> >>>>> 4174233976) 0xf3bca40 con 0xee856a0 >>> >>>>> -15> 2014-08-13 17:52:56.594957 7f8a798c7700 5 -- op tracker -- >>> >>>>> , >>> >>>>> seq: >>> >>>>> 299, time: 2014-08-13 17:52:56.594874, event: header_read, op: >>> >>>>> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >>> >>>>> ack+read+known_if_redirected e220) >>> >>>>> -14> 2014-08-13 17:52:56.594970 7f8a798c7700 5 -- op tracker -- >>> >>>>> , >>> >>>>> seq: >>> >>>>> 299, time: 2014-08-13 17:52:56.594880, event: throttled, op: >>> >>>>> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >>> >>>>> ack+read+known_if_redirected e220) >>> >>>>> -13> 2014-08-13 17:52:56.594978 7f8a798c7700 5 -- op tracker -- >>> >>>>> , >>> >>>>> seq: >>> >>>>> 299, time: 2014-08-13 17:52:56.594917, event: all_read, op: >>> >>>>> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >>> >>>>> ack+read+known_if_redirected e220) >>> >>>>> -12> 2014-08-13 17:52:56.594986 7f8a798c7700 5 -- op tracker -- >>> >>>>> , >>> >>>>> seq: >>> >>>>> 299, time: 0.000000, event: dispatched, op: osd_op(client.7512.0:1 >>> >>>>> [pgls >>> >>>>> start_epoch 0] 3.0 ack+read+known_if_redirected e220) >>> >>>>> -11> 2014-08-13 17:52:56.595127 7f8a90795700 5 -- op tracker -- >>> >>>>> , >>> >>>>> seq: >>> >>>>> 299, time: 2014-08-13 17:52:56.595104, event: reached_pg, op: >>> >>>>> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >>> >>>>> ack+read+known_if_redirected e220) >>> >>>>> -10> 2014-08-13 17:52:56.595159 7f8a90795700 5 -- op tracker -- >>> >>>>> , >>> >>>>> seq: >>> >>>>> 299, time: 2014-08-13 17:52:56.595153, event: started, op: >>> >>>>> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >>> >>>>> ack+read+known_if_redirected e220) >>> >>>>> -9> 2014-08-13 17:52:56.602179 7f8a90795700 1 -- >>> >>>>> 10.141.8.182:6839/64670 --> 10.141.8.180:0/1018433 -- >>> >>>>> osd_op_reply(1 >>> >>>>> [pgls >>> >>>>> start_epoch 0] v164'30654 uv30654 ondisk = 0) v6 -- ?+0 0xec16180 >>> >>>>> con >>> >>>>> 0xee856a0 >>> >>>>> -8> 2014-08-13 17:52:56.602211 7f8a90795700 5 -- op tracker -- >>> >>>>> , >>> >>>>> seq: >>> >>>>> 299, time: 2014-08-13 17:52:56.602205, event: done, op: >>> >>>>> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >>> >>>>> ack+read+known_if_redirected e220) >>> >>>>> -7> 2014-08-13 17:52:56.614839 7f8a798c7700 1 -- >>> >>>>> 10.141.8.182:6839/64670 <== client.7512 10.141.8.180:0/1018433 2 >>> >>>>> ==== >>> >>>>> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >>> >>>>> ack+read+known_if_redirected e220) v4 ==== 151+0+89 (3460833343 0 >>> >>>>> 2600845095) 0xf3bcec0 con 0xee856a0 >>> >>>>> -6> 2014-08-13 17:52:56.614864 7f8a798c7700 5 -- op tracker -- >>> >>>>> , >>> >>>>> seq: >>> >>>>> 300, time: 2014-08-13 17:52:56.614789, event: header_read, op: >>> >>>>> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >>> >>>>> ack+read+known_if_redirected e220) >>> >>>>> -5> 2014-08-13 17:52:56.614874 7f8a798c7700 5 -- op tracker -- >>> >>>>> , >>> >>>>> seq: >>> >>>>> 300, time: 2014-08-13 17:52:56.614792, event: throttled, op: >>> >>>>> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >>> >>>>> ack+read+known_if_redirected e220) >>> >>>>> -4> 2014-08-13 17:52:56.614884 7f8a798c7700 5 -- op tracker -- >>> >>>>> , >>> >>>>> seq: >>> >>>>> 300, time: 2014-08-13 17:52:56.614835, event: all_read, op: >>> >>>>> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >>> >>>>> ack+read+known_if_redirected e220) >>> >>>>> -3> 2014-08-13 17:52:56.614891 7f8a798c7700 5 -- op tracker -- >>> >>>>> , >>> >>>>> seq: >>> >>>>> 300, time: 0.000000, event: dispatched, op: osd_op(client.7512.0:2 >>> >>>>> [pgls >>> >>>>> start_epoch 220] 3.0 ack+read+known_if_redirected e220) >>> >>>>> -2> 2014-08-13 17:52:56.614972 7f8a92f9a700 5 -- op tracker -- >>> >>>>> , >>> >>>>> seq: >>> >>>>> 300, time: 2014-08-13 17:52:56.614958, event: reached_pg, op: >>> >>>>> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >>> >>>>> ack+read+known_if_redirected e220) >>> >>>>> -1> 2014-08-13 17:52:56.614993 7f8a92f9a700 5 -- op tracker -- >>> >>>>> , >>> >>>>> seq: >>> >>>>> 300, time: 2014-08-13 17:52:56.614986, event: started, op: >>> >>>>> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >>> >>>>> ack+read+known_if_redirected e220) >>> >>>>> 0> 2014-08-13 17:52:56.617087 7f8a92f9a700 -1 >>> >>>>> os/GenericObjectMap.cc: >>> >>>>> In function 'int GenericObjectMap::list_objects(const coll_t&, >>> >>>>> ghobject_t, >>> >>>>> int, std::vector<ghobject_t>*, ghobject_t*)' thread 7f8a92f9a700 >>> >>>>> time >>> >>>>> 2014-08-13 17:52:56.615073 >>> >>>>> os/GenericObjectMap.cc: 1118: FAILED assert(start <= header.oid) >>> >>>>> >>> >>>>> >>> >>>>> ceph version 0.83 (78ff1f0a5dfd3c5850805b4021738564c36c92b8) >>> >>>>> 1: (GenericObjectMap::list_objects(coll_t const&, ghobject_t, int, >>> >>>>> std::vector<ghobject_t, std::allocator<ghobject_t> >*, >>> >>>>> ghobject_t*)+0x474) >>> >>>>> [0x98f774] >>> >>>>> 2: (KeyValueStore::collection_list_partial(coll_t, ghobject_t, >>> >>>>> int, >>> >>>>> int, >>> >>>>> snapid_t, std::vector<ghobject_t, std::allocator<ghobject_t> >*, >>> >>>>> ghobject_t*)+0x274) [0x8c5b54] >>> >>>>> 3: (PGBackend::objects_list_partial(hobject_t const&, int, int, >>> >>>>> snapid_t, >>> >>>>> std::vector<hobject_t, std::allocator<hobject_t> >*, >>> >>>>> hobject_t*)+0x1c9) >>> >>>>> [0x862de9] >>> >>>>> 4: (ReplicatedPG::do_pg_op(std::tr1::shared_ptr<OpRequest>)+0xea5) >>> >>>>> [0x7f67f5] >>> >>>>> 5: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x1f3) >>> >>>>> [0x8177b3] >>> >>>>> 6: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, >>> >>>>> ThreadPool::TPHandle&)+0x5d5) [0x7b8045] >>> >>>>> 7: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>> >>>>> std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x47d) >>> >>>>> [0x62bf8d] >>> >>>>> 8: (OSD::ShardedOpWQ::_process(unsigned int, >>> >>>>> ceph::heartbeat_handle_d*)+0x35c) [0x62c56c] >>> >>>>> 9: (ShardedThreadPool::shardedthreadpool_worker(unsigned >>> >>>>> int)+0x8cd) >>> >>>>> [0xa776fd] >>> >>>>> 10: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) >>> >>>>> [0xa79980] >>> >>>>> 11: (()+0x7df3) [0x7f8aac71fdf3] >>> >>>>> 12: (clone()+0x6d) [0x7f8aab1963dd] >>> >>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>> >>>>> needed >>> >>>>> to >>> >>>>> interpret this. >>> >>>>> >>> >>>>> >>> >>>>> ceph version 0.83 (78ff1f0a5dfd3c5850805b4021738564c36c92b8) >>> >>>>> 1: /usr/bin/ceph-osd() [0x99b466] >>> >>>>> 2: (()+0xf130) [0x7f8aac727130] >>> >>>>> 3: (gsignal()+0x39) [0x7f8aab0d5989] >>> >>>>> 4: (abort()+0x148) [0x7f8aab0d7098] >>> >>>>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) >>> >>>>> [0x7f8aab9e89d5] >>> >>>>> 6: (()+0x5e946) [0x7f8aab9e6946] >>> >>>>> 7: (()+0x5e973) [0x7f8aab9e6973] >>> >>>>> 8: (()+0x5eb9f) [0x7f8aab9e6b9f] >>> >>>>> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>> >>>>> const*)+0x1ef) [0xa8805f] >>> >>>>> 10: (GenericObjectMap::list_objects(coll_t const&, ghobject_t, >>> >>>>> int, >>> >>>>> std::vector<ghobject_t, std::allocator<ghobject_t> >*, >>> >>>>> ghobject_t*)+0x474) >>> >>>>> [0x98f774] >>> >>>>> 11: (KeyValueStore::collection_list_partial(coll_t, ghobject_t, >>> >>>>> int, >>> >>>>> int, >>> >>>>> snapid_t, std::vector<ghobject_t, std::allocator<ghobject_t> >*, >>> >>>>> ghobject_t*)+0x274) [0x8c5b54] >>> >>>>> 12: (PGBackend::objects_list_partial(hobject_t const&, int, int, >>> >>>>> snapid_t, >>> >>>>> std::vector<hobject_t, std::allocator<hobject_t> >*, >>> >>>>> hobject_t*)+0x1c9) >>> >>>>> [0x862de9] >>> >>>>> 13: >>> >>>>> (ReplicatedPG::do_pg_op(std::tr1::shared_ptr<OpRequest>)+0xea5) >>> >>>>> [0x7f67f5] >>> >>>>> 14: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x1f3) >>> >>>>> [0x8177b3] >>> >>>>> 15: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, >>> >>>>> ThreadPool::TPHandle&)+0x5d5) [0x7b8045] >>> >>>>> 16: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>> >>>>> std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x47d) >>> >>>>> [0x62bf8d] >>> >>>>> 17: (OSD::ShardedOpWQ::_process(unsigned int, >>> >>>>> ceph::heartbeat_handle_d*)+0x35c) [0x62c56c] >>> >>>>> 18: (ShardedThreadPool::shardedthreadpool_worker(unsigned >>> >>>>> int)+0x8cd) >>> >>>>> [0xa776fd] >>> >>>>> 19: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) >>> >>>>> [0xa79980] >>> >>>>> 20: (()+0x7df3) [0x7f8aac71fdf3] >>> >>>>> 21: (clone()+0x6d) [0x7f8aab1963dd] >>> >>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>> >>>>> needed >>> >>>>> to >>> >>>>> interpret this. >>> >>>>> >>> >>>>> --- begin dump of recent events --- >>> >>>>> 0> 2014-08-13 17:52:56.714214 7f8a92f9a700 -1 *** Caught >>> >>>>> signal >>> >>>>> (Aborted) ** >>> >>>>> in thread 7f8a92f9a700 >>> >>>>> >>> >>>>> ceph version 0.83 (78ff1f0a5dfd3c5850805b4021738564c36c92b8) >>> >>>>> 1: /usr/bin/ceph-osd() [0x99b466] >>> >>>>> 2: (()+0xf130) [0x7f8aac727130] >>> >>>>> 3: (gsignal()+0x39) [0x7f8aab0d5989] >>> >>>>> 4: (abort()+0x148) [0x7f8aab0d7098] >>> >>>>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) >>> >>>>> [0x7f8aab9e89d5] >>> >>>>> 6: (()+0x5e946) [0x7f8aab9e6946] >>> >>>>> 7: (()+0x5e973) [0x7f8aab9e6973] >>> >>>>> 8: (()+0x5eb9f) [0x7f8aab9e6b9f] >>> >>>>> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>> >>>>> const*)+0x1ef) [0xa8805f] >>> >>>>> 10: (GenericObjectMap::list_objects(coll_t const&, ghobject_t, >>> >>>>> int, >>> >>>>> std::vector<ghobject_t, std::allocator<ghobject_t> >*, >>> >>>>> ghobject_t*)+0x474) >>> >>>>> [0x98f774] >>> >>>>> 11: (KeyValueStore::collection_list_partial(coll_t, ghobject_t, >>> >>>>> int, >>> >>>>> int, >>> >>>>> snapid_t, std::vector<ghobject_t, std::allocator<ghobject_t> >*, >>> >>>>> ghobject_t*)+0x274) [0x8c5b54] >>> >>>>> 12: (PGBackend::objects_list_partial(hobject_t const&, int, int, >>> >>>>> snapid_t, >>> >>>>> std::vector<hobject_t, std::allocator<hobject_t> >*, >>> >>>>> hobject_t*)+0x1c9) >>> >>>>> [0x862de9] >>> >>>>> 13: >>> >>>>> (ReplicatedPG::do_pg_op(std::tr1::shared_ptr<OpRequest>)+0xea5) >>> >>>>> [0x7f67f5] >>> >>>>> 14: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x1f3) >>> >>>>> [0x8177b3] >>> >>>>> 15: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, >>> >>>>> ThreadPool::TPHandle&)+0x5d5) [0x7b8045] >>> >>>>> 16: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>> >>>>> std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x47d) >>> >>>>> [0x62bf8d] >>> >>>>> 17: (OSD::ShardedOpWQ::_process(unsigned int, >>> >>>>> ceph::heartbeat_handle_d*)+0x35c) [0x62c56c] >>> >>>>> 18: (ShardedThreadPool::shardedthreadpool_worker(unsigned >>> >>>>> int)+0x8cd) >>> >>>>> [0xa776fd] >>> >>>>> 19: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) >>> >>>>> [0xa79980] >>> >>>>> 20: (()+0x7df3) [0x7f8aac71fdf3] >>> >>>>> 21: (clone()+0x6d) [0x7f8aab1963dd] >>> >>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>> >>>>> needed >>> >>>>> to >>> >>>>> interpret this. >>> >>>>> >>> >>>>> I guess this has something to do with using the dev Keyvaluestore? >>> >>>>> >>> >>>>> >>> >>>>> Thanks! >>> >>>>> >>> >>>>> Kenneth >>> >>>>> >>> >>>>> _______________________________________________ >>> >>>>> ceph-users mailing list >>> >>>>> ceph-users at lists.ceph.com >>> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> -- >>> >>>> Best Regards, >>> >>>> >>> >>>> Wheat >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> ----- End message from Haomai Wang <haomaiwang at gmail.com> ----- >>> >>> >>> >>> -- >>> >>> >>> >>> Met vriendelijke groeten, >>> >>> Kenneth Waegeman >>> >>> >>> >> >>> >> >>> >> >>> >> -- >>> >> Best Regards, >>> >> >>> >> Wheat >>> > >>> > >>> > >>> > ----- End message from Haomai Wang <haomaiwang at gmail.com> ----- >>> > >>> > -- >>> > >>> > Met vriendelijke groeten, >>> > Kenneth Waegeman >>> > >>> >>> >>> >>> -- >>> Best Regards, >>> >>> Wheat >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users at lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> > > > ----- End message from Sage Weil <sweil at redhat.com> ----- > > > -- > > Met vriendelijke groeten, > Kenneth Waegeman > -- Best Regards, Wheat