Hmm, could you please list your instructions including cluster existing time and all relevant ops? I want to reproduce it. On Mon, Sep 1, 2014 at 4:45 PM, Kenneth Waegeman <Kenneth.Waegeman at ugent.be> wrote: > Hi, > > I reinstalled the cluster with 0.84, and tried again running rados bench > on a EC coded pool on keyvaluestore. > Nothing crashed this time, but when I check the status: > > health HEALTH_ERR 128 pgs inconsistent; 128 scrub errors; too few pgs > per osd (15 < min 20) > monmap e1: 3 mons at {ceph001=10.141.8.180:6789/0, > ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0}, election epoch > 8, quorum 0,1,2 ceph001,ceph002,ceph003 > osdmap e174: 78 osds: 78 up, 78 in > pgmap v147680: 1216 pgs, 3 pools, 14758 GB data, 3690 kobjects > 1753 GB used, 129 TB / 131 TB avail > 1088 active+clean > 128 active+clean+inconsistent > > the 128 inconsistent pgs are ALL the pgs of the EC KV store ( the others > are on Filestore) > > The only thing I can see in the logs is that after the rados tests, it > start scrubbing, and for each KV pg I get something like this: > > 2014-08-31 11:14:09.050747 osd.11 10.141.8.180:6833/61098 4 : [ERR] 2.3s0 > scrub stat mismatch, got 28164/29291 objects, 0/0 clones, 28164/29291 > dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, > 118128377856/122855358464 bytes. > > What could here be the problem? > Thanks again!! > > Kenneth > > > ----- Message from Haomai Wang <haomaiwang at gmail.com> --------- > Date: Tue, 26 Aug 2014 17:11:43 +0800 > From: Haomai Wang <haomaiwang at gmail.com> > Subject: Re: ceph cluster inconsistency? > To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be> > Cc: ceph-users at lists.ceph.com > > > Hmm, it looks like you hit this bug(http://tracker.ceph.com/issues/9223). >> >> Sorry for the late message, I forget that this fix is merged into 0.84. >> >> Thanks for your patient :-) >> >> On Tue, Aug 26, 2014 at 4:39 PM, Kenneth Waegeman >> <Kenneth.Waegeman at ugent.be> wrote: >> >>> >>> Hi, >>> >>> In the meantime I already tried with upgrading the cluster to 0.84, to >>> see >>> if that made a difference, and it seems it does. >>> I can't reproduce the crashing osds by doing a 'rados -p ecdata ls' >>> anymore. >>> >>> But now the cluster detect it is inconsistent: >>> >>> cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d >>> health HEALTH_ERR 40 pgs inconsistent; 40 scrub errors; too few >>> pgs >>> per osd (4 < min 20); mon.ceph002 low disk space >>> monmap e3: 3 mons at >>> {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0, >>> ceph003=10.141.8.182:6789/0}, >>> election epoch 30, quorum 0,1,2 ceph001,ceph002,ceph003 >>> mdsmap e78951: 1/1/1 up {0=ceph003.cubone.os=up:active}, 3 >>> up:standby >>> osdmap e145384: 78 osds: 78 up, 78 in >>> pgmap v247095: 320 pgs, 4 pools, 15366 GB data, 3841 kobjects >>> 1502 GB used, 129 TB / 131 TB avail >>> 279 active+clean >>> 40 active+clean+inconsistent >>> 1 active+clean+scrubbing+deep >>> >>> >>> I tried to do ceph pg repair for all the inconsistent pgs: >>> >>> cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d >>> health HEALTH_ERR 40 pgs inconsistent; 1 pgs repair; 40 scrub >>> errors; >>> too few pgs per osd (4 < min 20); mon.ceph002 low disk space >>> monmap e3: 3 mons at >>> {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0, >>> ceph003=10.141.8.182:6789/0}, >>> election epoch 30, quorum 0,1,2 ceph001,ceph002,ceph003 >>> mdsmap e79486: 1/1/1 up {0=ceph003.cubone.os=up:active}, 3 >>> up:standby >>> osdmap e146452: 78 osds: 78 up, 78 in >>> pgmap v248520: 320 pgs, 4 pools, 15366 GB data, 3841 kobjects >>> 1503 GB used, 129 TB / 131 TB avail >>> 279 active+clean >>> 39 active+clean+inconsistent >>> 1 active+clean+scrubbing+deep >>> 1 active+clean+scrubbing+deep+inconsistent+repair >>> >>> I let it recovering through the night, but this morning the mons were all >>> gone, nothing to see in the log files.. The osds were all still up! >>> >>> cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d >>> health HEALTH_ERR 36 pgs inconsistent; 1 pgs repair; 36 scrub >>> errors; >>> too few pgs per osd (4 < min 20) >>> monmap e7: 3 mons at >>> {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0, >>> ceph003=10.141.8.182:6789/0}, >>> election epoch 44, quorum 0,1,2 ceph001,ceph002,ceph003 >>> mdsmap e109481: 1/1/1 up {0=ceph003.cubone.os=up:active}, 3 >>> up:standby >>> osdmap e203410: 78 osds: 78 up, 78 in >>> pgmap v331747: 320 pgs, 4 pools, 15251 GB data, 3812 kobjects >>> 1547 GB used, 129 TB / 131 TB avail >>> 1 active+clean+scrubbing+deep+inconsistent+repair >>> 284 active+clean >>> 35 active+clean+inconsistent >>> >>> I restarted the monitors now, I will let you know when I see something >>> more.. >>> >>> >>> >>> >>> ----- Message from Haomai Wang <haomaiwang at gmail.com> --------- >>> Date: Sun, 24 Aug 2014 12:51:41 +0800 >>> >>> From: Haomai Wang <haomaiwang at gmail.com> >>> Subject: Re: ceph cluster inconsistency? >>> To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be>, >>> ceph-users at lists.ceph.com >>> >>> >>> It's really strange! I write a test program according the key ordering >>>> you provided and parse the corresponding value. It's true! >>>> >>>> I have no idea now. If free, could you add this debug code to >>>> "src/os/GenericObjectMap.cc" and insert *before* "assert(start <= >>>> header.oid);": >>>> >>>> dout(0) << "start: " << start << "header.oid: " << header.oid << >>>> dendl; >>>> >>>> Then you need to recompile ceph-osd and run it again. The output log >>>> can help it! >>>> >>>> On Tue, Aug 19, 2014 at 10:19 PM, Haomai Wang <haomaiwang at gmail.com> >>>> wrote: >>>> >>>>> >>>>> I feel a little embarrassed, 1024 rows still true for me. >>>>> >>>>> I was wondering if you could give your all keys via >>>>> ""ceph-kvstore-tool /var/lib/ceph/osd/ceph-67/current/ list >>>>> _GHOBJTOSEQ_ > keys.log?. >>>>> >>>>> thanks! >>>>> >>>>> On Tue, Aug 19, 2014 at 4:58 PM, Kenneth Waegeman >>>>> <Kenneth.Waegeman at ugent.be> wrote: >>>>> >>>>>> >>>>>> >>>>>> ----- Message from Haomai Wang <haomaiwang at gmail.com> --------- >>>>>> Date: Tue, 19 Aug 2014 12:28:27 +0800 >>>>>> >>>>>> From: Haomai Wang <haomaiwang at gmail.com> >>>>>> Subject: Re: ceph cluster inconsistency? >>>>>> To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be> >>>>>> Cc: Sage Weil <sweil at redhat.com>, ceph-users at lists.ceph.com >>>>>> >>>>>> >>>>>> On Mon, Aug 18, 2014 at 7:32 PM, Kenneth Waegeman >>>>>>> <Kenneth.Waegeman at ugent.be> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ----- Message from Haomai Wang <haomaiwang at gmail.com> --------- >>>>>>>> Date: Mon, 18 Aug 2014 18:34:11 +0800 >>>>>>>> >>>>>>>> From: Haomai Wang <haomaiwang at gmail.com> >>>>>>>> Subject: Re: ceph cluster inconsistency? >>>>>>>> To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be> >>>>>>>> Cc: Sage Weil <sweil at redhat.com>, ceph-users at lists.ceph.com >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Aug 18, 2014 at 5:38 PM, Kenneth Waegeman >>>>>>>>> <Kenneth.Waegeman at ugent.be> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I tried this after restarting the osd, but I guess that was not >>>>>>>>>> the >>>>>>>>>> aim >>>>>>>>>> ( >>>>>>>>>> # ceph-kvstore-tool /var/lib/ceph/osd/ceph-67/current/ list >>>>>>>>>> _GHOBJTOSEQ_| >>>>>>>>>> grep 6adb1100 -A 100 >>>>>>>>>> IO error: lock /var/lib/ceph/osd/ceph-67/current//LOCK: Resource >>>>>>>>>> temporarily >>>>>>>>>> unavailable >>>>>>>>>> tools/ceph_kvstore_tool.cc: In function >>>>>>>>>> 'StoreTool::StoreTool(const >>>>>>>>>> string&)' thread 7f8fecf7d780 time 2014-08-18 11:12:29.551780 >>>>>>>>>> tools/ceph_kvstore_tool.cc: 38: FAILED >>>>>>>>>> assert(!db_ptr->open(std::cerr)) >>>>>>>>>> .. >>>>>>>>>> ) >>>>>>>>>> >>>>>>>>>> When I run it after bringing the osd down, it takes a while, but >>>>>>>>>> it >>>>>>>>>> has >>>>>>>>>> no >>>>>>>>>> output.. (When running it without the grep, I'm getting a huge >>>>>>>>>> list >>>>>>>>>> ) >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Oh, sorry for it! I made a mistake, the hash value(6adb1100) will >>>>>>>>> be >>>>>>>>> reversed into leveldb. >>>>>>>>> So grep "benchmark_data_ceph001.cubone.os_5560_object789734" >>>>>>>>> should >>>>>>>>> be >>>>>>>>> help it. >>>>>>>>> >>>>>>>>> this gives: >>>>>>>> >>>>>>>> [root at ceph003 ~]# ceph-kvstore-tool /var/lib/ceph/osd/ceph-67/ >>>>>>>> current/ >>>>>>>> list >>>>>>>> _GHOBJTOSEQ_ | grep 5560_object789734 -A 100 >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011BDA6!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object789734!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011C027!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object1330170!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011C6FD!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_4919_object227366!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011CB03!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object1363631!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011CDF0!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object1573957!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011D02C!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object1019282!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011E2B5!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object1283563!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011E511!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_4919_object273736!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011E547!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object1170628!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011EAAB!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_4919_object256335!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011F446!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object1484196!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011FC59!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object884178!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001203F3!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object853746!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001208E3!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object36633!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00120B37!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object1235337!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001210B6!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object1661351!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001210CB!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object238126!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012184C!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object339943!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00121916!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object1047094!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001219C1!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object520642!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001222BB!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object639565!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001223AA!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_4919_object231080!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012243C!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object858050!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012289C!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object241796!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00122D28!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_4919_object7462!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00122DFE!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object243798!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00122EFC!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_8961_object109512!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001232D7!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object653973!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001234A3!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object1378169!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00123714!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object512925!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001237D9!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_4919_object23289!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00123854!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object1108852!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00123971!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object704026!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00123F75!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_8961_object250441!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00124083!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object706178!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001240FA!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object316952!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012447D!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object538734!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001244D9!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object789215!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001247CD!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_8961_object265993!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00124897!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object610597!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00124BE4!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object691723!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00124C9B!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object1306135!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00124E1D!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object520580!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012534C!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object659767!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00125A81!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object184060!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00125E77!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object1292867!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00126562!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object1201410!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00126B34!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object1657326!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00127383!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object1269787!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00127396!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object500115!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001277F8!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object394932!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001279DD!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_4919_object252963!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00127B40!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object936811!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00127BAC!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object1481773!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012894E!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object999885!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00128D05!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object943667!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012908A!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object212990!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00129519!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object437596!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00129716!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object1585330!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00129798!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object603505!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001299C9!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object808800!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00129B7A!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object23193!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00129B9A!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object1158397!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012A932!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object542450!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012B77A!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_8961_object195480!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012BE8C!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_4919_object312911!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012BF74!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object1563783!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012C65C!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object1123980!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012C6FE!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_3411_object913!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012CCAD!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object400863!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012CDBB!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object789667!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012D14B!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object1020723!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012D95B!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_8961_object106293!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012E3C8!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object1355526!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012E5B3!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object1491348!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012F2BB!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_8961_object338872!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012F374!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object1337264!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012FBE5!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object1512395!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012FCE3!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_8961_object298610!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012FEB6!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_4919_object120824!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001301CA!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object816326!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00130263!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object777163!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00130529!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object1413173!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001317D9!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object809510!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0013204F!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object471416!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00132400!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object695087!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00132A19!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object591945!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00132BF8!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object302000!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00132F5B!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object1645443!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00133B8B!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object761911!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0013433E!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object1467727!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00134446!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object791960!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00134678!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object677078!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00134A96!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object254923!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001355D0!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_31461_object321528!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00135690!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_4919_object36935!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00135B62!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object1228272!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00135C72!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_4812_object2180!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00135DEE!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object425705!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00136366!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object141569!head >>>>>>>> >>>>>>>> >>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00136371!!3!!benchmark_data_ >>>>>>>> ceph001%ecubone%eos_5560_object564213!head >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> 100 rows seemed true for me. I found the min list objects is 1024. >>>>>>> Please could you run >>>>>>> "ceph-kvstore-tool /var/lib/ceph/osd/ceph-67/current/ list >>>>>>> _GHOBJTOSEQ_| grep 6adb1100 -A 1024" >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> I got the output in attachment >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>>>>> Or should I run this immediately after the osd is crashed, >>>>>>>>>> (because >>>>>>>>>> it >>>>>>>>>> maybe >>>>>>>>>> rebalanced? I did already restarted the cluster) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I don't know if it is related, but before I could all do that, I >>>>>>>>>> had >>>>>>>>>> to >>>>>>>>>> fix >>>>>>>>>> something else: A monitor did run out if disk space, using 8GB for >>>>>>>>>> his >>>>>>>>>> store.db folder (lot of sst files). Other monitors are also near >>>>>>>>>> that >>>>>>>>>> level. >>>>>>>>>> Never had that problem on previous setups before. I recreated a >>>>>>>>>> monitor >>>>>>>>>> and >>>>>>>>>> now it uses 3.8GB. >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> It exists some duplicate data which needed to be compacted. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> Another idea, maybe you can make KeyValueStore's stripe size align >>>>>>>>> with EC stripe size. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> How can I do that? Is there some documentation about that? >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ceph --show-config | grep keyvaluestore >>>>>>>> >>>>>>> >>>>>>> >>>>>>> debug_keyvaluestore = 0/0 >>>>>>> keyvaluestore_queue_max_ops = 50 >>>>>>> keyvaluestore_queue_max_bytes = 104857600 >>>>>>> keyvaluestore_debug_check_backend = false >>>>>>> keyvaluestore_op_threads = 2 >>>>>>> keyvaluestore_op_thread_timeout = 60 >>>>>>> keyvaluestore_op_thread_suicide_timeout = 180 >>>>>>> keyvaluestore_default_strip_size = 4096 >>>>>>> keyvaluestore_max_expected_write_size = 16777216 >>>>>>> keyvaluestore_header_cache_size = 4096 >>>>>>> keyvaluestore_backend = leveldb >>>>>>> >>>>>>> keyvaluestore_default_strip_size is the wanted >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> I haven't think deeply and maybe I will try it later. >>>>>>>>> >>>>>>>>> Thanks! >>>>>>>>>> >>>>>>>>>> Kenneth >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ----- Message from Sage Weil <sweil at redhat.com> --------- >>>>>>>>>> Date: Fri, 15 Aug 2014 06:10:34 -0700 (PDT) >>>>>>>>>> From: Sage Weil <sweil at redhat.com> >>>>>>>>>> >>>>>>>>>> Subject: Re: ceph cluster inconsistency? >>>>>>>>>> To: Haomai Wang <haomaiwang at gmail.com> >>>>>>>>>> Cc: Kenneth Waegeman <Kenneth.Waegeman at ugent.be>, >>>>>>>>>> ceph-users at lists.ceph.com >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, 15 Aug 2014, Haomai Wang wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hi Kenneth, >>>>>>>>>>>> >>>>>>>>>>>> I don't find valuable info in your logs, it lack of the >>>>>>>>>>>> necessary >>>>>>>>>>>> debug output when accessing crash code. >>>>>>>>>>>> >>>>>>>>>>>> But I scan the encode/decode implementation in GenericObjectMap >>>>>>>>>>>> and >>>>>>>>>>>> find something bad. >>>>>>>>>>>> >>>>>>>>>>>> For example, two oid has same hash and their name is: >>>>>>>>>>>> A: "rb.data.123" >>>>>>>>>>>> B: "rb-123" >>>>>>>>>>>> >>>>>>>>>>>> In ghobject_t compare level, A < B. But GenericObjectMap encode >>>>>>>>>>>> "." >>>>>>>>>>>> to >>>>>>>>>>>> "%e", so the key in DB is: >>>>>>>>>>>> A: _GHOBJTOSEQ_:blah!51615000!!none!!rb%edata%e123!head >>>>>>>>>>>> B: _GHOBJTOSEQ_:blah!51615000!!none!!rb-123!head >>>>>>>>>>>> >>>>>>>>>>>> A > B >>>>>>>>>>>> >>>>>>>>>>>> And it seemed that the escape function is useless and should be >>>>>>>>>>>> disabled. >>>>>>>>>>>> >>>>>>>>>>>> I'm not sure whether Kenneth's problem is touching this bug. >>>>>>>>>>>> Because >>>>>>>>>>>> this scene only occur when the object set is very large and make >>>>>>>>>>>> the >>>>>>>>>>>> two object has same hash value. >>>>>>>>>>>> >>>>>>>>>>>> Kenneth, could you have time to run "ceph-kv-store [path-to-osd] >>>>>>>>>>>> list >>>>>>>>>>>> _GHOBJTOSEQ_| grep 6adb1100 -A 100". ceph-kv-store is a debug >>>>>>>>>>>> tool >>>>>>>>>>>> which can be compiled from source. You can clone ceph repo and >>>>>>>>>>>> run >>>>>>>>>>>> "./authongen.sh; ./configure; cd src; make ceph-kvstore-tool". >>>>>>>>>>>> "path-to-osd" should be "/var/lib/ceph/osd-[id]/current/". >>>>>>>>>>>> "6adb1100" >>>>>>>>>>>> is from your verbose log and the next 100 rows should know >>>>>>>>>>>> necessary >>>>>>>>>>>> infos. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> You can also get ceph-kvstore-tool from the 'ceph-tests' package. >>>>>>>>>>> >>>>>>>>>>> Hi sage, do you think we need to provided with upgrade function >>>>>>>>>>>> to >>>>>>>>>>>> fix >>>>>>>>>>>> it? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hmm, we might. This only affects the key/value encoding right? >>>>>>>>>>> The >>>>>>>>>>> FileStore is using its own function to map these to file names? >>>>>>>>>>> >>>>>>>>>>> Can you open a ticket in the tracker for this? >>>>>>>>>>> >>>>>>>>>>> Thanks! >>>>>>>>>>> sage >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Aug 14, 2014 at 7:36 PM, Kenneth Waegeman >>>>>>>>>>>> <Kenneth.Waegeman at ugent.be> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> ----- Message from Haomai Wang <haomaiwang at gmail.com> >>>>>>>>>>>>> --------- >>>>>>>>>>>>> Date: Thu, 14 Aug 2014 19:11:55 +0800 >>>>>>>>>>>>> >>>>>>>>>>>>> From: Haomai Wang <haomaiwang at gmail.com> >>>>>>>>>>>>> Subject: Re: ceph cluster inconsistency? >>>>>>>>>>>>> To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Could you add config "debug_keyvaluestore = 20/20" to the >>>>>>>>>>>>>> crashed >>>>>>>>>>>>>> osd >>>>>>>>>>>>>> and replay the command causing crash? >>>>>>>>>>>>>> >>>>>>>>>>>>>> I would like to get more debug infos! Thanks. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I included the log in attachment! >>>>>>>>>>>>> Thanks! >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Aug 14, 2014 at 4:41 PM, Kenneth Waegeman >>>>>>>>>>>>>> <Kenneth.Waegeman at ugent.be> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have: >>>>>>>>>>>>>>> osd_objectstore = keyvaluestore-dev >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> in the global section of my ceph.conf >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [root at ceph002 ~]# ceph osd erasure-code-profile get >>>>>>>>>>>>>>> profile11 >>>>>>>>>>>>>>> directory=/usr/lib64/ceph/erasure-code >>>>>>>>>>>>>>> k=8 >>>>>>>>>>>>>>> m=3 >>>>>>>>>>>>>>> plugin=jerasure >>>>>>>>>>>>>>> ruleset-failure-domain=osd >>>>>>>>>>>>>>> technique=reed_sol_van >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> the ecdata pool has this as profile >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> pool 3 'ecdata' erasure size 11 min_size 8 crush_ruleset 2 >>>>>>>>>>>>>>> object_hash >>>>>>>>>>>>>>> rjenkins pg_num 128 pgp_num 128 last_change 161 flags >>>>>>>>>>>>>>> hashpspool >>>>>>>>>>>>>>> stripe_width 4096 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ECrule in crushmap >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> rule ecdata { >>>>>>>>>>>>>>> ruleset 2 >>>>>>>>>>>>>>> type erasure >>>>>>>>>>>>>>> min_size 3 >>>>>>>>>>>>>>> max_size 20 >>>>>>>>>>>>>>> step set_chooseleaf_tries 5 >>>>>>>>>>>>>>> step take default-ec >>>>>>>>>>>>>>> step choose indep 0 type osd >>>>>>>>>>>>>>> step emit >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> root default-ec { >>>>>>>>>>>>>>> id -8 # do not change unnecessarily >>>>>>>>>>>>>>> # weight 140.616 >>>>>>>>>>>>>>> alg straw >>>>>>>>>>>>>>> hash 0 # rjenkins1 >>>>>>>>>>>>>>> item ceph001-ec weight 46.872 >>>>>>>>>>>>>>> item ceph002-ec weight 46.872 >>>>>>>>>>>>>>> item ceph003-ec weight 46.872 >>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cheers! >>>>>>>>>>>>>>> Kenneth >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ----- Message from Haomai Wang <haomaiwang at gmail.com> >>>>>>>>>>>>>>> --------- >>>>>>>>>>>>>>> Date: Thu, 14 Aug 2014 10:07:50 +0800 >>>>>>>>>>>>>>> From: Haomai Wang <haomaiwang at gmail.com> >>>>>>>>>>>>>>> Subject: Re: ceph cluster inconsistency? >>>>>>>>>>>>>>> To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be> >>>>>>>>>>>>>>> Cc: ceph-users <ceph-users at lists.ceph.com> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Kenneth, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Could you give your configuration related to EC and >>>>>>>>>>>>>>>> KeyValueStore? >>>>>>>>>>>>>>>> Not sure whether it's bug on KeyValueStore >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Aug 14, 2014 at 12:06 AM, Kenneth Waegeman >>>>>>>>>>>>>>>> <Kenneth.Waegeman at ugent.be> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I was doing some tests with rados bench on a Erasure Coded >>>>>>>>>>>>>>>>> pool >>>>>>>>>>>>>>>>> (using >>>>>>>>>>>>>>>>> keyvaluestore-dev objectstore) on 0.83, and I see some >>>>>>>>>>>>>>>>> strangs >>>>>>>>>>>>>>>>> things: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> [root at ceph001 ~]# ceph status >>>>>>>>>>>>>>>>> cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d >>>>>>>>>>>>>>>>> health HEALTH_WARN too few pgs per osd (4 < min 20) >>>>>>>>>>>>>>>>> monmap e1: 3 mons at >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0, >>>>>>>>>>>>>>>>> ceph003=10.141.8.182:6789/0}, >>>>>>>>>>>>>>>>> election epoch 6, quorum 0,1,2 ceph001,ceph002,ceph003 >>>>>>>>>>>>>>>>> mdsmap e116: 1/1/1 up {0=ceph001.cubone.os=up:active}, >>>>>>>>>>>>>>>>> 2 >>>>>>>>>>>>>>>>> up:standby >>>>>>>>>>>>>>>>> osdmap e292: 78 osds: 78 up, 78 in >>>>>>>>>>>>>>>>> pgmap v48873: 320 pgs, 4 pools, 15366 GB data, 3841 >>>>>>>>>>>>>>>>> kobjects >>>>>>>>>>>>>>>>> 1381 GB used, 129 TB / 131 TB avail >>>>>>>>>>>>>>>>> 320 active+clean >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> There is around 15T of data, but only 1.3 T usage. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> This is also visible in rados: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> [root at ceph001 ~]# rados df >>>>>>>>>>>>>>>>> pool name category KB objects >>>>>>>>>>>>>>>>> clones >>>>>>>>>>>>>>>>> degraded unfound rd rd KB >>>>>>>>>>>>>>>>> wr >>>>>>>>>>>>>>>>> wr >>>>>>>>>>>>>>>>> KB >>>>>>>>>>>>>>>>> data - 0 0 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 0 0 0 0 0 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ecdata - 16113451009 3933959 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 0 0 1 1 3935632 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 16116850711 >>>>>>>>>>>>>>>>> metadata - 2 20 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 0 0 33 36 21 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 8 >>>>>>>>>>>>>>>>> rbd - 0 0 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 0 0 0 0 0 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> total used 1448266016 3933979 >>>>>>>>>>>>>>>>> total avail 139400181016 >>>>>>>>>>>>>>>>> total space 140848447032 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Another (related?) thing: if I do rados -p ecdata ls, I >>>>>>>>>>>>>>>>> trigger >>>>>>>>>>>>>>>>> osd >>>>>>>>>>>>>>>>> shutdowns (each time): >>>>>>>>>>>>>>>>> I get a list followed by an error: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_8961_object243839 >>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object801983 >>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_31461_object856489 >>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_8961_object202232 >>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_4919_object33199 >>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object807797 >>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_4919_object74729 >>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_31461_object1264121 >>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object1318513 >>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object1202111 >>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_31461_object939107 >>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_31461_object729682 >>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object122915 >>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object76521 >>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object113261 >>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_31461_object575079 >>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object671042 >>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object381146 >>>>>>>>>>>>>>>>> 2014-08-13 17:57:48.736150 7f65047b5700 0 -- >>>>>>>>>>>>>>>>> 10.141.8.180:0/1023295 >> >>>>>>>>>>>>>>>>> 10.141.8.182:6839/4471 pipe(0x7f64fc019b20 sd=5 :0 s=1 >>>>>>>>>>>>>>>>> pgs=0 >>>>>>>>>>>>>>>>> cs=0 >>>>>>>>>>>>>>>>> l=1 >>>>>>>>>>>>>>>>> c=0x7f64fc019db0).fault >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> And I can see this in the log files: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -25> 2014-08-13 17:52:56.323908 7f8a97fa4700 1 -- >>>>>>>>>>>>>>>>> 10.143.8.182:6827/64670 <== osd.57 10.141.8.182:0/15796 51 >>>>>>>>>>>>>>>>> ==== >>>>>>>>>>>>>>>>> osd_ping(ping e220 stamp 2014-08-13 17:52:56.323092) v2 >>>>>>>>>>>>>>>>> ==== >>>>>>>>>>>>>>>>> 47+0+0 >>>>>>>>>>>>>>>>> (3227325175 0 0) 0xf475940 con 0xee89fa0 >>>>>>>>>>>>>>>>> -24> 2014-08-13 17:52:56.323938 7f8a97fa4700 1 -- >>>>>>>>>>>>>>>>> 10.143.8.182:6827/64670 --> 10.141.8.182:0/15796 -- >>>>>>>>>>>>>>>>> osd_ping(ping_reply >>>>>>>>>>>>>>>>> e220 >>>>>>>>>>>>>>>>> stamp 2014-08-13 17:52:56.323092) v2 -- ?+0 0xf815b00 con >>>>>>>>>>>>>>>>> 0xee89fa0 >>>>>>>>>>>>>>>>> -23> 2014-08-13 17:52:56.324078 7f8a997a7700 1 -- >>>>>>>>>>>>>>>>> 10.141.8.182:6840/64670 <== osd.57 10.141.8.182:0/15796 51 >>>>>>>>>>>>>>>>> ==== >>>>>>>>>>>>>>>>> osd_ping(ping e220 stamp 2014-08-13 17:52:56.323092) v2 >>>>>>>>>>>>>>>>> ==== >>>>>>>>>>>>>>>>> 47+0+0 >>>>>>>>>>>>>>>>> (3227325175 0 0) 0xf132bc0 con 0xee8a680 >>>>>>>>>>>>>>>>> -22> 2014-08-13 17:52:56.324111 7f8a997a7700 1 -- >>>>>>>>>>>>>>>>> 10.141.8.182:6840/64670 --> 10.141.8.182:0/15796 -- >>>>>>>>>>>>>>>>> osd_ping(ping_reply >>>>>>>>>>>>>>>>> e220 >>>>>>>>>>>>>>>>> stamp 2014-08-13 17:52:56.323092) v2 -- ?+0 0xf811a40 con >>>>>>>>>>>>>>>>> 0xee8a680 >>>>>>>>>>>>>>>>> -21> 2014-08-13 17:52:56.584461 7f8a997a7700 1 -- >>>>>>>>>>>>>>>>> 10.141.8.182:6840/64670 <== osd.29 10.143.8.181:0/12142 47 >>>>>>>>>>>>>>>>> ==== >>>>>>>>>>>>>>>>> osd_ping(ping e220 stamp 2014-08-13 17:52:56.583010) v2 >>>>>>>>>>>>>>>>> ==== >>>>>>>>>>>>>>>>> 47+0+0 >>>>>>>>>>>>>>>>> (3355887204 0 0) 0xf655940 con 0xee88b00 >>>>>>>>>>>>>>>>> -20> 2014-08-13 17:52:56.584486 7f8a997a7700 1 -- >>>>>>>>>>>>>>>>> 10.141.8.182:6840/64670 --> 10.143.8.181:0/12142 -- >>>>>>>>>>>>>>>>> osd_ping(ping_reply >>>>>>>>>>>>>>>>> e220 >>>>>>>>>>>>>>>>> stamp 2014-08-13 17:52:56.583010) v2 -- ?+0 0xf132bc0 con >>>>>>>>>>>>>>>>> 0xee88b00 >>>>>>>>>>>>>>>>> -19> 2014-08-13 17:52:56.584498 7f8a97fa4700 1 -- >>>>>>>>>>>>>>>>> 10.143.8.182:6827/64670 <== osd.29 10.143.8.181:0/12142 47 >>>>>>>>>>>>>>>>> ==== >>>>>>>>>>>>>>>>> osd_ping(ping e220 stamp 2014-08-13 17:52:56.583010) v2 >>>>>>>>>>>>>>>>> ==== >>>>>>>>>>>>>>>>> 47+0+0 >>>>>>>>>>>>>>>>> (3355887204 0 0) 0xf20e040 con 0xee886e0 >>>>>>>>>>>>>>>>> -18> 2014-08-13 17:52:56.584526 7f8a97fa4700 1 -- >>>>>>>>>>>>>>>>> 10.143.8.182:6827/64670 --> 10.143.8.181:0/12142 -- >>>>>>>>>>>>>>>>> osd_ping(ping_reply >>>>>>>>>>>>>>>>> e220 >>>>>>>>>>>>>>>>> stamp 2014-08-13 17:52:56.583010) v2 -- ?+0 0xf475940 con >>>>>>>>>>>>>>>>> 0xee886e0 >>>>>>>>>>>>>>>>> -17> 2014-08-13 17:52:56.594448 7f8a798c7700 1 -- >>>>>>>>>>>>>>>>> 10.141.8.182:6839/64670 >> :/0 pipe(0xec15f00 sd=74 :6839 >>>>>>>>>>>>>>>>> s=0 >>>>>>>>>>>>>>>>> pgs=0 >>>>>>>>>>>>>>>>> cs=0 >>>>>>>>>>>>>>>>> l=0 >>>>>>>>>>>>>>>>> c=0xee856a0).accept sd=74 10.141.8.180:47641/0 >>>>>>>>>>>>>>>>> -16> 2014-08-13 17:52:56.594921 7f8a798c7700 1 -- >>>>>>>>>>>>>>>>> 10.141.8.182:6839/64670 <== client.7512 >>>>>>>>>>>>>>>>> 10.141.8.180:0/1018433 >>>>>>>>>>>>>>>>> 1 >>>>>>>>>>>>>>>>> ==== >>>>>>>>>>>>>>>>> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) v4 ==== 151+0+39 >>>>>>>>>>>>>>>>> (1972163119 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 4174233976) 0xf3bca40 con 0xee856a0 >>>>>>>>>>>>>>>>> -15> 2014-08-13 17:52:56.594957 7f8a798c7700 5 -- op >>>>>>>>>>>>>>>>> tracker >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>> seq: >>>>>>>>>>>>>>>>> 299, time: 2014-08-13 17:52:56.594874, event: header_read, >>>>>>>>>>>>>>>>> op: >>>>>>>>>>>>>>>>> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >>>>>>>>>>>>>>>>> -14> 2014-08-13 17:52:56.594970 7f8a798c7700 5 -- op >>>>>>>>>>>>>>>>> tracker >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>> seq: >>>>>>>>>>>>>>>>> 299, time: 2014-08-13 17:52:56.594880, event: throttled, >>>>>>>>>>>>>>>>> op: >>>>>>>>>>>>>>>>> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >>>>>>>>>>>>>>>>> -13> 2014-08-13 17:52:56.594978 7f8a798c7700 5 -- op >>>>>>>>>>>>>>>>> tracker >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>> seq: >>>>>>>>>>>>>>>>> 299, time: 2014-08-13 17:52:56.594917, event: all_read, op: >>>>>>>>>>>>>>>>> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >>>>>>>>>>>>>>>>> -12> 2014-08-13 17:52:56.594986 7f8a798c7700 5 -- op >>>>>>>>>>>>>>>>> tracker >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>> seq: >>>>>>>>>>>>>>>>> 299, time: 0.000000, event: dispatched, op: >>>>>>>>>>>>>>>>> osd_op(client.7512.0:1 >>>>>>>>>>>>>>>>> [pgls >>>>>>>>>>>>>>>>> start_epoch 0] 3.0 ack+read+known_if_redirected e220) >>>>>>>>>>>>>>>>> -11> 2014-08-13 17:52:56.595127 7f8a90795700 5 -- op >>>>>>>>>>>>>>>>> tracker >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>> seq: >>>>>>>>>>>>>>>>> 299, time: 2014-08-13 17:52:56.595104, event: reached_pg, >>>>>>>>>>>>>>>>> op: >>>>>>>>>>>>>>>>> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >>>>>>>>>>>>>>>>> -10> 2014-08-13 17:52:56.595159 7f8a90795700 5 -- op >>>>>>>>>>>>>>>>> tracker >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>> seq: >>>>>>>>>>>>>>>>> 299, time: 2014-08-13 17:52:56.595153, event: started, op: >>>>>>>>>>>>>>>>> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >>>>>>>>>>>>>>>>> -9> 2014-08-13 17:52:56.602179 7f8a90795700 1 -- >>>>>>>>>>>>>>>>> 10.141.8.182:6839/64670 --> 10.141.8.180:0/1018433 -- >>>>>>>>>>>>>>>>> osd_op_reply(1 >>>>>>>>>>>>>>>>> [pgls >>>>>>>>>>>>>>>>> start_epoch 0] v164'30654 uv30654 ondisk = 0) v6 -- ?+0 >>>>>>>>>>>>>>>>> 0xec16180 >>>>>>>>>>>>>>>>> con >>>>>>>>>>>>>>>>> 0xee856a0 >>>>>>>>>>>>>>>>> -8> 2014-08-13 17:52:56.602211 7f8a90795700 5 -- op >>>>>>>>>>>>>>>>> tracker >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>> seq: >>>>>>>>>>>>>>>>> 299, time: 2014-08-13 17:52:56.602205, event: done, op: >>>>>>>>>>>>>>>>> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >>>>>>>>>>>>>>>>> -7> 2014-08-13 17:52:56.614839 7f8a798c7700 1 -- >>>>>>>>>>>>>>>>> 10.141.8.182:6839/64670 <== client.7512 >>>>>>>>>>>>>>>>> 10.141.8.180:0/1018433 >>>>>>>>>>>>>>>>> 2 >>>>>>>>>>>>>>>>> ==== >>>>>>>>>>>>>>>>> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) v4 ==== 151+0+89 >>>>>>>>>>>>>>>>> (3460833343 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2600845095) 0xf3bcec0 con 0xee856a0 >>>>>>>>>>>>>>>>> -6> 2014-08-13 17:52:56.614864 7f8a798c7700 5 -- op >>>>>>>>>>>>>>>>> tracker >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>> seq: >>>>>>>>>>>>>>>>> 300, time: 2014-08-13 17:52:56.614789, event: header_read, >>>>>>>>>>>>>>>>> op: >>>>>>>>>>>>>>>>> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >>>>>>>>>>>>>>>>> -5> 2014-08-13 17:52:56.614874 7f8a798c7700 5 -- op >>>>>>>>>>>>>>>>> tracker >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>> seq: >>>>>>>>>>>>>>>>> 300, time: 2014-08-13 17:52:56.614792, event: throttled, >>>>>>>>>>>>>>>>> op: >>>>>>>>>>>>>>>>> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >>>>>>>>>>>>>>>>> -4> 2014-08-13 17:52:56.614884 7f8a798c7700 5 -- op >>>>>>>>>>>>>>>>> tracker >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>> seq: >>>>>>>>>>>>>>>>> 300, time: 2014-08-13 17:52:56.614835, event: all_read, op: >>>>>>>>>>>>>>>>> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >>>>>>>>>>>>>>>>> -3> 2014-08-13 17:52:56.614891 7f8a798c7700 5 -- op >>>>>>>>>>>>>>>>> tracker >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>> seq: >>>>>>>>>>>>>>>>> 300, time: 0.000000, event: dispatched, op: >>>>>>>>>>>>>>>>> osd_op(client.7512.0:2 >>>>>>>>>>>>>>>>> [pgls >>>>>>>>>>>>>>>>> start_epoch 220] 3.0 ack+read+known_if_redirected e220) >>>>>>>>>>>>>>>>> -2> 2014-08-13 17:52:56.614972 7f8a92f9a700 5 -- op >>>>>>>>>>>>>>>>> tracker >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>> seq: >>>>>>>>>>>>>>>>> 300, time: 2014-08-13 17:52:56.614958, event: reached_pg, >>>>>>>>>>>>>>>>> op: >>>>>>>>>>>>>>>>> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >>>>>>>>>>>>>>>>> -1> 2014-08-13 17:52:56.614993 7f8a92f9a700 5 -- op >>>>>>>>>>>>>>>>> tracker >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>> seq: >>>>>>>>>>>>>>>>> 300, time: 2014-08-13 17:52:56.614986, event: started, op: >>>>>>>>>>>>>>>>> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >>>>>>>>>>>>>>>>> 0> 2014-08-13 17:52:56.617087 7f8a92f9a700 -1 >>>>>>>>>>>>>>>>> os/GenericObjectMap.cc: >>>>>>>>>>>>>>>>> In function 'int GenericObjectMap::list_objects(const >>>>>>>>>>>>>>>>> coll_t&, >>>>>>>>>>>>>>>>> ghobject_t, >>>>>>>>>>>>>>>>> int, std::vector<ghobject_t>*, ghobject_t*)' thread >>>>>>>>>>>>>>>>> 7f8a92f9a700 >>>>>>>>>>>>>>>>> time >>>>>>>>>>>>>>>>> 2014-08-13 17:52:56.615073 >>>>>>>>>>>>>>>>> os/GenericObjectMap.cc: 1118: FAILED assert(start <= >>>>>>>>>>>>>>>>> header.oid) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ceph version 0.83 (78ff1f0a5dfd3c5850805b40217385 >>>>>>>>>>>>>>>>> 64c36c92b8) >>>>>>>>>>>>>>>>> 1: (GenericObjectMap::list_objects(coll_t const&, >>>>>>>>>>>>>>>>> ghobject_t, >>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>> std::vector<ghobject_t, std::allocator<ghobject_t> >*, >>>>>>>>>>>>>>>>> ghobject_t*)+0x474) >>>>>>>>>>>>>>>>> [0x98f774] >>>>>>>>>>>>>>>>> 2: (KeyValueStore::collection_list_partial(coll_t, >>>>>>>>>>>>>>>>> ghobject_t, >>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>> snapid_t, std::vector<ghobject_t, >>>>>>>>>>>>>>>>> std::allocator<ghobject_t> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> *, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ghobject_t*)+0x274) [0x8c5b54] >>>>>>>>>>>>>>>>> 3: (PGBackend::objects_list_partial(hobject_t const&, int, >>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>> snapid_t, >>>>>>>>>>>>>>>>> std::vector<hobject_t, std::allocator<hobject_t> >*, >>>>>>>>>>>>>>>>> hobject_t*)+0x1c9) >>>>>>>>>>>>>>>>> [0x862de9] >>>>>>>>>>>>>>>>> 4: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> (ReplicatedPG::do_pg_op(std::tr1::shared_ptr<OpRequest>)+ >>>>>>>>>>>>>>>>> 0xea5) >>>>>>>>>>>>>>>>> [0x7f67f5] >>>>>>>>>>>>>>>>> 5: >>>>>>>>>>>>>>>>> (ReplicatedPG::do_op(std::tr1: >>>>>>>>>>>>>>>>> :shared_ptr<OpRequest>)+0x1f3) >>>>>>>>>>>>>>>>> [0x8177b3] >>>>>>>>>>>>>>>>> 6: (ReplicatedPG::do_request(std: >>>>>>>>>>>>>>>>> :tr1::shared_ptr<OpRequest>, >>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x5d5) [0x7b8045] >>>>>>>>>>>>>>>>> 7: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>> std::tr1::shared_ptr<OpRequest>, >>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x47d) >>>>>>>>>>>>>>>>> [0x62bf8d] >>>>>>>>>>>>>>>>> 8: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x35c) [0x62c56c] >>>>>>>>>>>>>>>>> 9: (ShardedThreadPool::shardedthreadpool_worker(unsigned >>>>>>>>>>>>>>>>> int)+0x8cd) >>>>>>>>>>>>>>>>> [0xa776fd] >>>>>>>>>>>>>>>>> 10: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) >>>>>>>>>>>>>>>>> [0xa79980] >>>>>>>>>>>>>>>>> 11: (()+0x7df3) [0x7f8aac71fdf3] >>>>>>>>>>>>>>>>> 12: (clone()+0x6d) [0x7f8aab1963dd] >>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS >>>>>>>>>>>>>>>>> <executable>` >>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>> needed >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> interpret this. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ceph version 0.83 (78ff1f0a5dfd3c5850805b40217385 >>>>>>>>>>>>>>>>> 64c36c92b8) >>>>>>>>>>>>>>>>> 1: /usr/bin/ceph-osd() [0x99b466] >>>>>>>>>>>>>>>>> 2: (()+0xf130) [0x7f8aac727130] >>>>>>>>>>>>>>>>> 3: (gsignal()+0x39) [0x7f8aab0d5989] >>>>>>>>>>>>>>>>> 4: (abort()+0x148) [0x7f8aab0d7098] >>>>>>>>>>>>>>>>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) >>>>>>>>>>>>>>>>> [0x7f8aab9e89d5] >>>>>>>>>>>>>>>>> 6: (()+0x5e946) [0x7f8aab9e6946] >>>>>>>>>>>>>>>>> 7: (()+0x5e973) [0x7f8aab9e6973] >>>>>>>>>>>>>>>>> 8: (()+0x5eb9f) [0x7f8aab9e6b9f] >>>>>>>>>>>>>>>>> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, >>>>>>>>>>>>>>>>> char >>>>>>>>>>>>>>>>> const*)+0x1ef) [0xa8805f] >>>>>>>>>>>>>>>>> 10: (GenericObjectMap::list_objects(coll_t const&, >>>>>>>>>>>>>>>>> ghobject_t, >>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>> std::vector<ghobject_t, std::allocator<ghobject_t> >*, >>>>>>>>>>>>>>>>> ghobject_t*)+0x474) >>>>>>>>>>>>>>>>> [0x98f774] >>>>>>>>>>>>>>>>> 11: (KeyValueStore::collection_list_partial(coll_t, >>>>>>>>>>>>>>>>> ghobject_t, >>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>> snapid_t, std::vector<ghobject_t, >>>>>>>>>>>>>>>>> std::allocator<ghobject_t> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> *, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ghobject_t*)+0x274) [0x8c5b54] >>>>>>>>>>>>>>>>> 12: (PGBackend::objects_list_partial(hobject_t const&, >>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>> snapid_t, >>>>>>>>>>>>>>>>> std::vector<hobject_t, std::allocator<hobject_t> >*, >>>>>>>>>>>>>>>>> hobject_t*)+0x1c9) >>>>>>>>>>>>>>>>> [0x862de9] >>>>>>>>>>>>>>>>> 13: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> (ReplicatedPG::do_pg_op(std::tr1::shared_ptr<OpRequest>)+ >>>>>>>>>>>>>>>>> 0xea5) >>>>>>>>>>>>>>>>> [0x7f67f5] >>>>>>>>>>>>>>>>> 14: >>>>>>>>>>>>>>>>> (ReplicatedPG::do_op(std::tr1: >>>>>>>>>>>>>>>>> :shared_ptr<OpRequest>)+0x1f3) >>>>>>>>>>>>>>>>> [0x8177b3] >>>>>>>>>>>>>>>>> 15: >>>>>>>>>>>>>>>>> (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, >>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x5d5) [0x7b8045] >>>>>>>>>>>>>>>>> 16: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>> std::tr1::shared_ptr<OpRequest>, >>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x47d) >>>>>>>>>>>>>>>>> [0x62bf8d] >>>>>>>>>>>>>>>>> 17: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x35c) [0x62c56c] >>>>>>>>>>>>>>>>> 18: (ShardedThreadPool::shardedthreadpool_worker(unsigned >>>>>>>>>>>>>>>>> int)+0x8cd) >>>>>>>>>>>>>>>>> [0xa776fd] >>>>>>>>>>>>>>>>> 19: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) >>>>>>>>>>>>>>>>> [0xa79980] >>>>>>>>>>>>>>>>> 20: (()+0x7df3) [0x7f8aac71fdf3] >>>>>>>>>>>>>>>>> 21: (clone()+0x6d) [0x7f8aab1963dd] >>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS >>>>>>>>>>>>>>>>> <executable>` >>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>> needed >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> interpret this. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> --- begin dump of recent events --- >>>>>>>>>>>>>>>>> 0> 2014-08-13 17:52:56.714214 7f8a92f9a700 -1 *** >>>>>>>>>>>>>>>>> Caught >>>>>>>>>>>>>>>>> signal >>>>>>>>>>>>>>>>> (Aborted) ** >>>>>>>>>>>>>>>>> in thread 7f8a92f9a700 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ceph version 0.83 (78ff1f0a5dfd3c5850805b40217385 >>>>>>>>>>>>>>>>> 64c36c92b8) >>>>>>>>>>>>>>>>> 1: /usr/bin/ceph-osd() [0x99b466] >>>>>>>>>>>>>>>>> 2: (()+0xf130) [0x7f8aac727130] >>>>>>>>>>>>>>>>> 3: (gsignal()+0x39) [0x7f8aab0d5989] >>>>>>>>>>>>>>>>> 4: (abort()+0x148) [0x7f8aab0d7098] >>>>>>>>>>>>>>>>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) >>>>>>>>>>>>>>>>> [0x7f8aab9e89d5] >>>>>>>>>>>>>>>>> 6: (()+0x5e946) [0x7f8aab9e6946] >>>>>>>>>>>>>>>>> 7: (()+0x5e973) [0x7f8aab9e6973] >>>>>>>>>>>>>>>>> 8: (()+0x5eb9f) [0x7f8aab9e6b9f] >>>>>>>>>>>>>>>>> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, >>>>>>>>>>>>>>>>> char >>>>>>>>>>>>>>>>> const*)+0x1ef) [0xa8805f] >>>>>>>>>>>>>>>>> 10: (GenericObjectMap::list_objects(coll_t const&, >>>>>>>>>>>>>>>>> ghobject_t, >>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>> std::vector<ghobject_t, std::allocator<ghobject_t> >*, >>>>>>>>>>>>>>>>> ghobject_t*)+0x474) >>>>>>>>>>>>>>>>> [0x98f774] >>>>>>>>>>>>>>>>> 11: (KeyValueStore::collection_list_partial(coll_t, >>>>>>>>>>>>>>>>> ghobject_t, >>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>> snapid_t, std::vector<ghobject_t, >>>>>>>>>>>>>>>>> std::allocator<ghobject_t> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> *, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ghobject_t*)+0x274) [0x8c5b54] >>>>>>>>>>>>>>>>> 12: (PGBackend::objects_list_partial(hobject_t const&, >>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>> snapid_t, >>>>>>>>>>>>>>>>> std::vector<hobject_t, std::allocator<hobject_t> >*, >>>>>>>>>>>>>>>>> hobject_t*)+0x1c9) >>>>>>>>>>>>>>>>> [0x862de9] >>>>>>>>>>>>>>>>> 13: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> (ReplicatedPG::do_pg_op(std::tr1::shared_ptr<OpRequest>)+ >>>>>>>>>>>>>>>>> 0xea5) >>>>>>>>>>>>>>>>> [0x7f67f5] >>>>>>>>>>>>>>>>> 14: >>>>>>>>>>>>>>>>> (ReplicatedPG::do_op(std::tr1: >>>>>>>>>>>>>>>>> :shared_ptr<OpRequest>)+0x1f3) >>>>>>>>>>>>>>>>> [0x8177b3] >>>>>>>>>>>>>>>>> 15: >>>>>>>>>>>>>>>>> (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, >>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x5d5) [0x7b8045] >>>>>>>>>>>>>>>>> 16: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>> std::tr1::shared_ptr<OpRequest>, >>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x47d) >>>>>>>>>>>>>>>>> [0x62bf8d] >>>>>>>>>>>>>>>>> 17: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x35c) [0x62c56c] >>>>>>>>>>>>>>>>> 18: (ShardedThreadPool::shardedthreadpool_worker(unsigned >>>>>>>>>>>>>>>>> int)+0x8cd) >>>>>>>>>>>>>>>>> [0xa776fd] >>>>>>>>>>>>>>>>> 19: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) >>>>>>>>>>>>>>>>> [0xa79980] >>>>>>>>>>>>>>>>> 20: (()+0x7df3) [0x7f8aac71fdf3] >>>>>>>>>>>>>>>>> 21: (clone()+0x6d) [0x7f8aab1963dd] >>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS >>>>>>>>>>>>>>>>> <executable>` >>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>> needed >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> interpret this. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I guess this has something to do with using the dev >>>>>>>>>>>>>>>>> Keyvaluestore? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Kenneth >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>> ceph-users mailing list >>>>>>>>>>>>>>>>> ceph-users at lists.ceph.com >>>>>>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Wheat >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ----- End message from Haomai Wang <haomaiwang at gmail.com> >>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Met vriendelijke groeten, >>>>>>>>>>>>>>> Kenneth Waegeman >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Wheat >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> ----- End message from Haomai Wang <haomaiwang at gmail.com> >>>>>>>>>>>>> ----- >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> >>>>>>>>>>>>> Met vriendelijke groeten, >>>>>>>>>>>>> Kenneth Waegeman >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Best Regards, >>>>>>>>>>>> >>>>>>>>>>>> Wheat >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> ceph-users mailing list >>>>>>>>>>>> ceph-users at lists.ceph.com >>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ----- End message from Sage Weil <sweil at redhat.com> ----- >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> Met vriendelijke groeten, >>>>>>>>>> Kenneth Waegeman >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Best Regards, >>>>>>>>> >>>>>>>>> Wheat >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ----- End message from Haomai Wang <haomaiwang at gmail.com> ----- >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> Met vriendelijke groeten, >>>>>>>> Kenneth Waegeman >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Best Regards, >>>>>>> >>>>>>> Wheat >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ----- End message from Haomai Wang <haomaiwang at gmail.com> ----- >>>>>> >>>>>> -- >>>>>> >>>>>> Met vriendelijke groeten, >>>>>> Kenneth Waegeman >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Best Regards, >>>>> >>>>> Wheat >>>>> >>>> >>>> >>>> >>>> >>>> -- >>>> Best Regards, >>>> >>>> Wheat >>>> >>> >>> >>> >>> ----- End message from Haomai Wang <haomaiwang at gmail.com> ----- >>> >>> -- >>> >>> Met vriendelijke groeten, >>> Kenneth Waegeman >>> >>> >>> >>> >> >> >> -- >> Best Regards, >> >> Wheat >> > > > ----- End message from Haomai Wang <haomaiwang at gmail.com> ----- > > -- > > Met vriendelijke groeten, > Kenneth Waegeman > > > -- Best Regards, Wheat -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140901/9fd2853d/attachment.htm>