Thank you very much ! Is this problem then related to the weird sizes I see: pgmap v55220: 1216 pgs, 3 pools, 3406 GB data, 852 kobjects 418 GB used, 88130 GB / 88549 GB avail a calculation with df shows indeed that there is about 400GB used on disks, but the tests I ran should indeed have generated 3,5 TB, as also seen in rados df: pool name category KB objects clones degraded unfound rd rd KB wr wr KB cache - 59150443 15466 0 0 0 1388365 5686734850 3665984 4709621763 ecdata - 3512807425 857620 0 0 0 1109938 312332288 857621 3512807426 I thought it was related to the inconsistency? Or can this be a sparse objects thing? (But I don't seem to found anything in the docs about that) Thanks again! Kenneth ----- Message from Haomai Wang <haomaiwang at gmail.com> --------- Date: Sun, 7 Sep 2014 20:34:39 +0800 From: Haomai Wang <haomaiwang at gmail.com> Subject: Re: ceph cluster inconsistency keyvaluestore To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be> Cc: ceph-users at lists.ceph.com > I have found the root cause. It's a bug. > > When chunky scrub happen, it will iterate the who pg's objects and > each iterator only a few objects will be scan. > > osd/PG.cc:3758 > ret = get_pgbackend()-> objects_list_partial( > start, > cct->_conf->osd_scrub_chunk_min, > cct->_conf->osd_scrub_chunk_max, > 0, > &objects, > &candidate_end); > > candidate_end is the end of object set and it's used to indicate the > next scrub process's start position. But it will be truncated: > > osd/PG.cc:3777 > while (!boundary_found && objects.size() > 1) { > hobject_t end = objects.back().get_boundary(); > objects.pop_back(); > > if (objects.back().get_filestore_key() != > end.get_filestore_key()) { > candidate_end = end; > boundary_found = true; > } > } > end which only contain "hash" field as hobject_t will be assign to > candidate_end. So the next scrub process a hobject_t only contains > "hash" field will be passed in to get_pgbackend()-> > objects_list_partial. > > It will cause incorrect results for KeyValueStore backend. Because it > will use strict key ordering for "collection_list_paritial" method. A > hobject_t only contains "hash" field will be: > > 1%e79s0_head!972F1B5D!!none!!!00000000000000000000!0!0 > > and the actual object is > 1%e79s0_head!972F1B5D!!1!!!object-name!head > > In other word, a object only contain "hash" field can't used by to > search a absolute object has the same "hash" field. > > @sage The simply way is modify obj->key function which will change > storage format. Because it's a experiment backend I would like to > provide with a external format change program help users do it. Is it > OK? > > > On Wed, Sep 3, 2014 at 9:16 PM, Kenneth Waegeman > <Kenneth.Waegeman at ugent.be> wrote: >> I also can reproduce it on a new slightly different set up (also EC on KV >> and Cache) by running ceph pg scrub on a KV pg: this pg will then get the >> 'inconsistent' status >> >> >> >> ----- Message from Kenneth Waegeman <Kenneth.Waegeman at UGent.be> --------- >> Date: Mon, 01 Sep 2014 16:28:31 +0200 >> From: Kenneth Waegeman <Kenneth.Waegeman at UGent.be> >> Subject: Re: ceph cluster inconsistency keyvaluestore >> To: Haomai Wang <haomaiwang at gmail.com> >> Cc: ceph-users at lists.ceph.com >> >> >> >>> Hi, >>> >>> >>> The cluster got installed with quattor, which uses ceph-deploy for >>> installation of daemons, writes the config file and installs the crushmap. >>> I have 3 hosts, each 12 disks, having a large KV partition (3.6T) for the >>> ECdata pool and a small cache partition (50G) for the cache >>> >>> I manually did this: >>> >>> ceph osd pool create cache 1024 1024 >>> ceph osd pool set cache size 2 >>> ceph osd pool set cache min_size 1 >>> ceph osd erasure-code-profile set profile11 k=8 m=3 >>> ruleset-failure-domain=osd >>> ceph osd pool create ecdata 128 128 erasure profile11 >>> ceph osd tier add ecdata cache >>> ceph osd tier cache-mode cache writeback >>> ceph osd tier set-overlay ecdata cache >>> ceph osd pool set cache hit_set_type bloom >>> ceph osd pool set cache hit_set_count 1 >>> ceph osd pool set cache hit_set_period 3600 >>> ceph osd pool set cache target_max_bytes $((280*1024*1024*1024)) >>> >>> (But the previous time I had the problem already without the cache part) >>> >>> >>> >>> Cluster live since 2014-08-29 15:34:16 >>> >>> Config file on host ceph001: >>> >>> [global] >>> auth_client_required = cephx >>> auth_cluster_required = cephx >>> auth_service_required = cephx >>> cluster_network = 10.143.8.0/24 >>> filestore_xattr_use_omap = 1 >>> fsid = 82766e04-585b-49a6-a0ac-c13d9ffd0a7d >>> mon_cluster_log_to_syslog = 1 >>> mon_host = ceph001.cubone.os, ceph002.cubone.os, ceph003.cubone.os >>> mon_initial_members = ceph001, ceph002, ceph003 >>> osd_crush_update_on_start = 0 >>> osd_journal_size = 10240 >>> osd_pool_default_min_size = 2 >>> osd_pool_default_pg_num = 512 >>> osd_pool_default_pgp_num = 512 >>> osd_pool_default_size = 3 >>> public_network = 10.141.8.0/24 >>> >>> [osd.11] >>> osd_objectstore = keyvaluestore-dev >>> >>> [osd.13] >>> osd_objectstore = keyvaluestore-dev >>> >>> [osd.15] >>> osd_objectstore = keyvaluestore-dev >>> >>> [osd.17] >>> osd_objectstore = keyvaluestore-dev >>> >>> [osd.19] >>> osd_objectstore = keyvaluestore-dev >>> >>> [osd.21] >>> osd_objectstore = keyvaluestore-dev >>> >>> [osd.23] >>> osd_objectstore = keyvaluestore-dev >>> >>> [osd.25] >>> osd_objectstore = keyvaluestore-dev >>> >>> [osd.3] >>> osd_objectstore = keyvaluestore-dev >>> >>> [osd.5] >>> osd_objectstore = keyvaluestore-dev >>> >>> [osd.7] >>> osd_objectstore = keyvaluestore-dev >>> >>> [osd.9] >>> osd_objectstore = keyvaluestore-dev >>> >>> >>> OSDs: >>> # id weight type name up/down reweight >>> -12 140.6 root default-cache >>> -9 46.87 host ceph001-cache >>> 2 3.906 osd.2 up 1 >>> 4 3.906 osd.4 up 1 >>> 6 3.906 osd.6 up 1 >>> 8 3.906 osd.8 up 1 >>> 10 3.906 osd.10 up 1 >>> 12 3.906 osd.12 up 1 >>> 14 3.906 osd.14 up 1 >>> 16 3.906 osd.16 up 1 >>> 18 3.906 osd.18 up 1 >>> 20 3.906 osd.20 up 1 >>> 22 3.906 osd.22 up 1 >>> 24 3.906 osd.24 up 1 >>> -10 46.87 host ceph002-cache >>> 28 3.906 osd.28 up 1 >>> 30 3.906 osd.30 up 1 >>> 32 3.906 osd.32 up 1 >>> 34 3.906 osd.34 up 1 >>> 36 3.906 osd.36 up 1 >>> 38 3.906 osd.38 up 1 >>> 40 3.906 osd.40 up 1 >>> 42 3.906 osd.42 up 1 >>> 44 3.906 osd.44 up 1 >>> 46 3.906 osd.46 up 1 >>> 48 3.906 osd.48 up 1 >>> 50 3.906 osd.50 up 1 >>> -11 46.87 host ceph003-cache >>> 54 3.906 osd.54 up 1 >>> 56 3.906 osd.56 up 1 >>> 58 3.906 osd.58 up 1 >>> 60 3.906 osd.60 up 1 >>> 62 3.906 osd.62 up 1 >>> 64 3.906 osd.64 up 1 >>> 66 3.906 osd.66 up 1 >>> 68 3.906 osd.68 up 1 >>> 70 3.906 osd.70 up 1 >>> 72 3.906 osd.72 up 1 >>> 74 3.906 osd.74 up 1 >>> 76 3.906 osd.76 up 1 >>> -8 140.6 root default-ec >>> -5 46.87 host ceph001-ec >>> 3 3.906 osd.3 up 1 >>> 5 3.906 osd.5 up 1 >>> 7 3.906 osd.7 up 1 >>> 9 3.906 osd.9 up 1 >>> 11 3.906 osd.11 up 1 >>> 13 3.906 osd.13 up 1 >>> 15 3.906 osd.15 up 1 >>> 17 3.906 osd.17 up 1 >>> 19 3.906 osd.19 up 1 >>> 21 3.906 osd.21 up 1 >>> 23 3.906 osd.23 up 1 >>> 25 3.906 osd.25 up 1 >>> -6 46.87 host ceph002-ec >>> 29 3.906 osd.29 up 1 >>> 31 3.906 osd.31 up 1 >>> 33 3.906 osd.33 up 1 >>> 35 3.906 osd.35 up 1 >>> 37 3.906 osd.37 up 1 >>> 39 3.906 osd.39 up 1 >>> 41 3.906 osd.41 up 1 >>> 43 3.906 osd.43 up 1 >>> 45 3.906 osd.45 up 1 >>> 47 3.906 osd.47 up 1 >>> 49 3.906 osd.49 up 1 >>> 51 3.906 osd.51 up 1 >>> -7 46.87 host ceph003-ec >>> 55 3.906 osd.55 up 1 >>> 57 3.906 osd.57 up 1 >>> 59 3.906 osd.59 up 1 >>> 61 3.906 osd.61 up 1 >>> 63 3.906 osd.63 up 1 >>> 65 3.906 osd.65 up 1 >>> 67 3.906 osd.67 up 1 >>> 69 3.906 osd.69 up 1 >>> 71 3.906 osd.71 up 1 >>> 73 3.906 osd.73 up 1 >>> 75 3.906 osd.75 up 1 >>> 77 3.906 osd.77 up 1 >>> -4 23.44 root default-ssd >>> -1 7.812 host ceph001-ssd >>> 0 3.906 osd.0 up 1 >>> 1 3.906 osd.1 up 1 >>> -2 7.812 host ceph002-ssd >>> 26 3.906 osd.26 up 1 >>> 27 3.906 osd.27 up 1 >>> -3 7.812 host ceph003-ssd >>> 52 3.906 osd.52 up 1 >>> 53 3.906 osd.53 up 1 >>> >>> Cache OSDs are each 50G, the EC KV OSDS 3.6T, (ssds not used right now) >>> >>> Pools: >>> pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash >>> rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0 >>> pool 1 'cache' replicated size 2 min_size 1 crush_ruleset 0 object_hash >>> rjenkins pg_num 1024 pgp_num 1024 last_change 174 flags >>> hashpspool,incomplete_clones tier_of 2 cache_mode writeback target_bytes >>> 300647710720 hit_set bloom{false_positive_probability: 0.05, >>> target_size: 0, >>> seed: 0} 3600s x1 stripe_width 0 >>> pool 2 'ecdata' erasure size 11 min_size 8 crush_ruleset 2 object_hash >>> rjenkins pg_num 128 pgp_num 128 last_change 170 lfor 170 flags hashpspool >>> tiers 1 read_tier 1 write_tier 1 stripe_width 4096 >>> >>> >>> Crushmap: >>> # begin crush map >>> tunable choose_local_fallback_tries 0 >>> tunable choose_local_tries 0 >>> tunable choose_total_tries 50 >>> tunable chooseleaf_descend_once 1 >>> >>> # devices >>> device 0 osd.0 >>> device 1 osd.1 >>> device 2 osd.2 >>> device 3 osd.3 >>> device 4 osd.4 >>> device 5 osd.5 >>> device 6 osd.6 >>> device 7 osd.7 >>> device 8 osd.8 >>> device 9 osd.9 >>> device 10 osd.10 >>> device 11 osd.11 >>> device 12 osd.12 >>> device 13 osd.13 >>> device 14 osd.14 >>> device 15 osd.15 >>> device 16 osd.16 >>> device 17 osd.17 >>> device 18 osd.18 >>> device 19 osd.19 >>> device 20 osd.20 >>> device 21 osd.21 >>> device 22 osd.22 >>> device 23 osd.23 >>> device 24 osd.24 >>> device 25 osd.25 >>> device 26 osd.26 >>> device 27 osd.27 >>> device 28 osd.28 >>> device 29 osd.29 >>> device 30 osd.30 >>> device 31 osd.31 >>> device 32 osd.32 >>> device 33 osd.33 >>> device 34 osd.34 >>> device 35 osd.35 >>> device 36 osd.36 >>> device 37 osd.37 >>> device 38 osd.38 >>> device 39 osd.39 >>> device 40 osd.40 >>> device 41 osd.41 >>> device 42 osd.42 >>> device 43 osd.43 >>> device 44 osd.44 >>> device 45 osd.45 >>> device 46 osd.46 >>> device 47 osd.47 >>> device 48 osd.48 >>> device 49 osd.49 >>> device 50 osd.50 >>> device 51 osd.51 >>> device 52 osd.52 >>> device 53 osd.53 >>> device 54 osd.54 >>> device 55 osd.55 >>> device 56 osd.56 >>> device 57 osd.57 >>> device 58 osd.58 >>> device 59 osd.59 >>> device 60 osd.60 >>> device 61 osd.61 >>> device 62 osd.62 >>> device 63 osd.63 >>> device 64 osd.64 >>> device 65 osd.65 >>> device 66 osd.66 >>> device 67 osd.67 >>> device 68 osd.68 >>> device 69 osd.69 >>> device 70 osd.70 >>> device 71 osd.71 >>> device 72 osd.72 >>> device 73 osd.73 >>> device 74 osd.74 >>> device 75 osd.75 >>> device 76 osd.76 >>> device 77 osd.77 >>> >>> # types >>> type 0 osd >>> type 1 host >>> type 2 root >>> >>> # buckets >>> host ceph001-ssd { >>> id -1 # do not change unnecessarily >>> # weight 7.812 >>> alg straw >>> hash 0 # rjenkins1 >>> item osd.0 weight 3.906 >>> item osd.1 weight 3.906 >>> } >>> host ceph002-ssd { >>> id -2 # do not change unnecessarily >>> # weight 7.812 >>> alg straw >>> hash 0 # rjenkins1 >>> item osd.26 weight 3.906 >>> item osd.27 weight 3.906 >>> } >>> host ceph003-ssd { >>> id -3 # do not change unnecessarily >>> # weight 7.812 >>> alg straw >>> hash 0 # rjenkins1 >>> item osd.52 weight 3.906 >>> item osd.53 weight 3.906 >>> } >>> root default-ssd { >>> id -4 # do not change unnecessarily >>> # weight 23.436 >>> alg straw >>> hash 0 # rjenkins1 >>> item ceph001-ssd weight 7.812 >>> item ceph002-ssd weight 7.812 >>> item ceph003-ssd weight 7.812 >>> } >>> host ceph001-ec { >>> id -5 # do not change unnecessarily >>> # weight 46.872 >>> alg straw >>> hash 0 # rjenkins1 >>> item osd.3 weight 3.906 >>> item osd.5 weight 3.906 >>> item osd.7 weight 3.906 >>> item osd.9 weight 3.906 >>> item osd.11 weight 3.906 >>> item osd.13 weight 3.906 >>> item osd.15 weight 3.906 >>> item osd.17 weight 3.906 >>> item osd.19 weight 3.906 >>> item osd.21 weight 3.906 >>> item osd.23 weight 3.906 >>> item osd.25 weight 3.906 >>> } >>> host ceph002-ec { >>> id -6 # do not change unnecessarily >>> # weight 46.872 >>> alg straw >>> hash 0 # rjenkins1 >>> item osd.29 weight 3.906 >>> item osd.31 weight 3.906 >>> item osd.33 weight 3.906 >>> item osd.35 weight 3.906 >>> item osd.37 weight 3.906 >>> item osd.39 weight 3.906 >>> item osd.41 weight 3.906 >>> item osd.43 weight 3.906 >>> item osd.45 weight 3.906 >>> item osd.47 weight 3.906 >>> item osd.49 weight 3.906 >>> item osd.51 weight 3.906 >>> } >>> host ceph003-ec { >>> id -7 # do not change unnecessarily >>> # weight 46.872 >>> alg straw >>> hash 0 # rjenkins1 >>> item osd.55 weight 3.906 >>> item osd.57 weight 3.906 >>> item osd.59 weight 3.906 >>> item osd.61 weight 3.906 >>> item osd.63 weight 3.906 >>> item osd.65 weight 3.906 >>> item osd.67 weight 3.906 >>> item osd.69 weight 3.906 >>> item osd.71 weight 3.906 >>> item osd.73 weight 3.906 >>> item osd.75 weight 3.906 >>> item osd.77 weight 3.906 >>> } >>> root default-ec { >>> id -8 # do not change unnecessarily >>> # weight 140.616 >>> alg straw >>> hash 0 # rjenkins1 >>> item ceph001-ec weight 46.872 >>> item ceph002-ec weight 46.872 >>> item ceph003-ec weight 46.872 >>> } >>> host ceph001-cache { >>> id -9 # do not change unnecessarily >>> # weight 46.872 >>> alg straw >>> hash 0 # rjenkins1 >>> item osd.2 weight 3.906 >>> item osd.4 weight 3.906 >>> item osd.6 weight 3.906 >>> item osd.8 weight 3.906 >>> item osd.10 weight 3.906 >>> item osd.12 weight 3.906 >>> item osd.14 weight 3.906 >>> item osd.16 weight 3.906 >>> item osd.18 weight 3.906 >>> item osd.20 weight 3.906 >>> item osd.22 weight 3.906 >>> item osd.24 weight 3.906 >>> } >>> host ceph002-cache { >>> id -10 # do not change unnecessarily >>> # weight 46.872 >>> alg straw >>> hash 0 # rjenkins1 >>> item osd.28 weight 3.906 >>> item osd.30 weight 3.906 >>> item osd.32 weight 3.906 >>> item osd.34 weight 3.906 >>> item osd.36 weight 3.906 >>> item osd.38 weight 3.906 >>> item osd.40 weight 3.906 >>> item osd.42 weight 3.906 >>> item osd.44 weight 3.906 >>> item osd.46 weight 3.906 >>> item osd.48 weight 3.906 >>> item osd.50 weight 3.906 >>> } >>> host ceph003-cache { >>> id -11 # do not change unnecessarily >>> # weight 46.872 >>> alg straw >>> hash 0 # rjenkins1 >>> item osd.54 weight 3.906 >>> item osd.56 weight 3.906 >>> item osd.58 weight 3.906 >>> item osd.60 weight 3.906 >>> item osd.62 weight 3.906 >>> item osd.64 weight 3.906 >>> item osd.66 weight 3.906 >>> item osd.68 weight 3.906 >>> item osd.70 weight 3.906 >>> item osd.72 weight 3.906 >>> item osd.74 weight 3.906 >>> item osd.76 weight 3.906 >>> } >>> root default-cache { >>> id -12 # do not change unnecessarily >>> # weight 140.616 >>> alg straw >>> hash 0 # rjenkins1 >>> item ceph001-cache weight 46.872 >>> item ceph002-cache weight 46.872 >>> item ceph003-cache weight 46.872 >>> } >>> >>> # rules >>> rule cache { >>> ruleset 0 >>> type replicated >>> min_size 1 >>> max_size 10 >>> step take default-cache >>> step chooseleaf firstn 0 type host >>> step emit >>> } >>> rule metadata { >>> ruleset 1 >>> type replicated >>> min_size 1 >>> max_size 10 >>> step take default-ssd >>> step chooseleaf firstn 0 type host >>> step emit >>> } >>> rule ecdata { >>> ruleset 2 >>> type erasure >>> min_size 3 >>> max_size 20 >>> step set_chooseleaf_tries 5 >>> step take default-ec >>> step choose indep 0 type osd >>> step emit >>> } >>> >>> # end crush map >>> >>> The benchmarks I then did: >>> >>> ./benchrw 50000 >>> >>> benchrw: >>> /usr/bin/rados -p ecdata bench $1 write --no-cleanup >>> /usr/bin/rados -p ecdata bench $1 seq >>> /usr/bin/rados -p ecdata bench $1 seq & >>> /usr/bin/rados -p ecdata bench $1 write --no-cleanup >>> >>> >>> Srubbing errors started soon after that: 2014-08-31 10:59:14 >>> >>> >>> Please let me know if you need more information, and thanks ! >>> >>> Kenneth >>> >>> ----- Message from Haomai Wang <haomaiwang at gmail.com> --------- >>> Date: Mon, 1 Sep 2014 21:30:16 +0800 >>> From: Haomai Wang <haomaiwang at gmail.com> >>> Subject: Re: ceph cluster inconsistency keyvaluestore >>> To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be> >>> Cc: ceph-users at lists.ceph.com >>> >>> >>>> Hmm, could you please list your instructions including cluster existing >>>> time and all relevant ops? I want to reproduce it. >>>> >>>> >>>> On Mon, Sep 1, 2014 at 4:45 PM, Kenneth Waegeman >>>> <Kenneth.Waegeman at ugent.be> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I reinstalled the cluster with 0.84, and tried again running rados bench >>>>> on a EC coded pool on keyvaluestore. >>>>> Nothing crashed this time, but when I check the status: >>>>> >>>>> health HEALTH_ERR 128 pgs inconsistent; 128 scrub errors; too few >>>>> pgs >>>>> per osd (15 < min 20) >>>>> monmap e1: 3 mons at {ceph001=10.141.8.180:6789/0, >>>>> ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0}, election epoch >>>>> 8, quorum 0,1,2 ceph001,ceph002,ceph003 >>>>> osdmap e174: 78 osds: 78 up, 78 in >>>>> pgmap v147680: 1216 pgs, 3 pools, 14758 GB data, 3690 kobjects >>>>> 1753 GB used, 129 TB / 131 TB avail >>>>> 1088 active+clean >>>>> 128 active+clean+inconsistent >>>>> >>>>> the 128 inconsistent pgs are ALL the pgs of the EC KV store ( the others >>>>> are on Filestore) >>>>> >>>>> The only thing I can see in the logs is that after the rados tests, it >>>>> start scrubbing, and for each KV pg I get something like this: >>>>> >>>>> 2014-08-31 11:14:09.050747 osd.11 10.141.8.180:6833/61098 4 : [ERR] >>>>> 2.3s0 >>>>> scrub stat mismatch, got 28164/29291 objects, 0/0 clones, 28164/29291 >>>>> dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, >>>>> 118128377856/122855358464 bytes. >>>>> >>>>> What could here be the problem? >>>>> Thanks again!! >>>>> >>>>> Kenneth >>>>> >>>>> >>>>> ----- Message from Haomai Wang <haomaiwang at gmail.com> --------- >>>>> Date: Tue, 26 Aug 2014 17:11:43 +0800 >>>>> From: Haomai Wang <haomaiwang at gmail.com> >>>>> Subject: Re: ceph cluster inconsistency? >>>>> To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be> >>>>> Cc: ceph-users at lists.ceph.com >>>>> >>>>> >>>>> Hmm, it looks like you hit this >>>>> bug(http://tracker.ceph.com/issues/9223). >>>>>> >>>>>> >>>>>> Sorry for the late message, I forget that this fix is merged into 0.84. >>>>>> >>>>>> Thanks for your patient :-) >>>>>> >>>>>> On Tue, Aug 26, 2014 at 4:39 PM, Kenneth Waegeman >>>>>> <Kenneth.Waegeman at ugent.be> wrote: >>>>>> >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> In the meantime I already tried with upgrading the cluster to 0.84, to >>>>>>> see >>>>>>> if that made a difference, and it seems it does. >>>>>>> I can't reproduce the crashing osds by doing a 'rados -p ecdata ls' >>>>>>> anymore. >>>>>>> >>>>>>> But now the cluster detect it is inconsistent: >>>>>>> >>>>>>> cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d >>>>>>> health HEALTH_ERR 40 pgs inconsistent; 40 scrub errors; too few >>>>>>> pgs >>>>>>> per osd (4 < min 20); mon.ceph002 low disk space >>>>>>> monmap e3: 3 mons at >>>>>>> {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0, >>>>>>> ceph003=10.141.8.182:6789/0}, >>>>>>> election epoch 30, quorum 0,1,2 ceph001,ceph002,ceph003 >>>>>>> mdsmap e78951: 1/1/1 up {0=ceph003.cubone.os=up:active}, 3 >>>>>>> up:standby >>>>>>> osdmap e145384: 78 osds: 78 up, 78 in >>>>>>> pgmap v247095: 320 pgs, 4 pools, 15366 GB data, 3841 kobjects >>>>>>> 1502 GB used, 129 TB / 131 TB avail >>>>>>> 279 active+clean >>>>>>> 40 active+clean+inconsistent >>>>>>> 1 active+clean+scrubbing+deep >>>>>>> >>>>>>> >>>>>>> I tried to do ceph pg repair for all the inconsistent pgs: >>>>>>> >>>>>>> cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d >>>>>>> health HEALTH_ERR 40 pgs inconsistent; 1 pgs repair; 40 scrub >>>>>>> errors; >>>>>>> too few pgs per osd (4 < min 20); mon.ceph002 low disk space >>>>>>> monmap e3: 3 mons at >>>>>>> {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0, >>>>>>> ceph003=10.141.8.182:6789/0}, >>>>>>> election epoch 30, quorum 0,1,2 ceph001,ceph002,ceph003 >>>>>>> mdsmap e79486: 1/1/1 up {0=ceph003.cubone.os=up:active}, 3 >>>>>>> up:standby >>>>>>> osdmap e146452: 78 osds: 78 up, 78 in >>>>>>> pgmap v248520: 320 pgs, 4 pools, 15366 GB data, 3841 kobjects >>>>>>> 1503 GB used, 129 TB / 131 TB avail >>>>>>> 279 active+clean >>>>>>> 39 active+clean+inconsistent >>>>>>> 1 active+clean+scrubbing+deep >>>>>>> 1 active+clean+scrubbing+deep+inconsistent+repair >>>>>>> >>>>>>> I let it recovering through the night, but this morning the mons were >>>>>>> all >>>>>>> gone, nothing to see in the log files.. The osds were all still up! >>>>>>> >>>>>>> cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d >>>>>>> health HEALTH_ERR 36 pgs inconsistent; 1 pgs repair; 36 scrub >>>>>>> errors; >>>>>>> too few pgs per osd (4 < min 20) >>>>>>> monmap e7: 3 mons at >>>>>>> {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0, >>>>>>> ceph003=10.141.8.182:6789/0}, >>>>>>> election epoch 44, quorum 0,1,2 ceph001,ceph002,ceph003 >>>>>>> mdsmap e109481: 1/1/1 up {0=ceph003.cubone.os=up:active}, 3 >>>>>>> up:standby >>>>>>> osdmap e203410: 78 osds: 78 up, 78 in >>>>>>> pgmap v331747: 320 pgs, 4 pools, 15251 GB data, 3812 kobjects >>>>>>> 1547 GB used, 129 TB / 131 TB avail >>>>>>> 1 active+clean+scrubbing+deep+inconsistent+repair >>>>>>> 284 active+clean >>>>>>> 35 active+clean+inconsistent >>>>>>> >>>>>>> I restarted the monitors now, I will let you know when I see something >>>>>>> more.. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ----- Message from Haomai Wang <haomaiwang at gmail.com> --------- >>>>>>> Date: Sun, 24 Aug 2014 12:51:41 +0800 >>>>>>> >>>>>>> From: Haomai Wang <haomaiwang at gmail.com> >>>>>>> Subject: Re: ceph cluster inconsistency? >>>>>>> To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be>, >>>>>>> ceph-users at lists.ceph.com >>>>>>> >>>>>>> >>>>>>> It's really strange! I write a test program according the key ordering >>>>>>>> >>>>>>>> you provided and parse the corresponding value. It's true! >>>>>>>> >>>>>>>> I have no idea now. If free, could you add this debug code to >>>>>>>> "src/os/GenericObjectMap.cc" and insert *before* "assert(start <= >>>>>>>> header.oid);": >>>>>>>> >>>>>>>> dout(0) << "start: " << start << "header.oid: " << header.oid << >>>>>>>> dendl; >>>>>>>> >>>>>>>> Then you need to recompile ceph-osd and run it again. The output log >>>>>>>> can help it! >>>>>>>> >>>>>>>> On Tue, Aug 19, 2014 at 10:19 PM, Haomai Wang <haomaiwang at gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> I feel a little embarrassed, 1024 rows still true for me. >>>>>>>>> >>>>>>>>> I was wondering if you could give your all keys via >>>>>>>>> ""ceph-kvstore-tool /var/lib/ceph/osd/ceph-67/current/ list >>>>>>>>> _GHOBJTOSEQ_ > keys.log?. >>>>>>>>> >>>>>>>>> thanks! >>>>>>>>> >>>>>>>>> On Tue, Aug 19, 2014 at 4:58 PM, Kenneth Waegeman >>>>>>>>> <Kenneth.Waegeman at ugent.be> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ----- Message from Haomai Wang <haomaiwang at gmail.com> --------- >>>>>>>>>> Date: Tue, 19 Aug 2014 12:28:27 +0800 >>>>>>>>>> >>>>>>>>>> From: Haomai Wang <haomaiwang at gmail.com> >>>>>>>>>> Subject: Re: ceph cluster inconsistency? >>>>>>>>>> To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be> >>>>>>>>>> Cc: Sage Weil <sweil at redhat.com>, ceph-users at lists.ceph.com >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Aug 18, 2014 at 7:32 PM, Kenneth Waegeman >>>>>>>>>>> >>>>>>>>>>> <Kenneth.Waegeman at ugent.be> wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ----- Message from Haomai Wang <haomaiwang at gmail.com> --------- >>>>>>>>>>>> Date: Mon, 18 Aug 2014 18:34:11 +0800 >>>>>>>>>>>> >>>>>>>>>>>> From: Haomai Wang <haomaiwang at gmail.com> >>>>>>>>>>>> Subject: Re: ceph cluster inconsistency? >>>>>>>>>>>> To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be> >>>>>>>>>>>> Cc: Sage Weil <sweil at redhat.com>, ceph-users at lists.ceph.com >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Aug 18, 2014 at 5:38 PM, Kenneth Waegeman >>>>>>>>>>>>> >>>>>>>>>>>>> <Kenneth.Waegeman at ugent.be> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I tried this after restarting the osd, but I guess that was not >>>>>>>>>>>>>> the >>>>>>>>>>>>>> aim >>>>>>>>>>>>>> ( >>>>>>>>>>>>>> # ceph-kvstore-tool /var/lib/ceph/osd/ceph-67/current/ list >>>>>>>>>>>>>> _GHOBJTOSEQ_| >>>>>>>>>>>>>> grep 6adb1100 -A 100 >>>>>>>>>>>>>> IO error: lock /var/lib/ceph/osd/ceph-67/current//LOCK: >>>>>>>>>>>>>> Resource >>>>>>>>>>>>>> temporarily >>>>>>>>>>>>>> unavailable >>>>>>>>>>>>>> tools/ceph_kvstore_tool.cc: In function >>>>>>>>>>>>>> 'StoreTool::StoreTool(const >>>>>>>>>>>>>> string&)' thread 7f8fecf7d780 time 2014-08-18 11:12:29.551780 >>>>>>>>>>>>>> tools/ceph_kvstore_tool.cc: 38: FAILED >>>>>>>>>>>>>> assert(!db_ptr->open(std::cerr)) >>>>>>>>>>>>>> .. >>>>>>>>>>>>>> ) >>>>>>>>>>>>>> >>>>>>>>>>>>>> When I run it after bringing the osd down, it takes a while, >>>>>>>>>>>>>> but >>>>>>>>>>>>>> it >>>>>>>>>>>>>> has >>>>>>>>>>>>>> no >>>>>>>>>>>>>> output.. (When running it without the grep, I'm getting a huge >>>>>>>>>>>>>> list >>>>>>>>>>>>>> ) >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Oh, sorry for it! I made a mistake, the hash value(6adb1100) >>>>>>>>>>>>> will >>>>>>>>>>>>> be >>>>>>>>>>>>> reversed into leveldb. >>>>>>>>>>>>> So grep "benchmark_data_ceph001.cubone.os_5560_object789734" >>>>>>>>>>>>> should >>>>>>>>>>>>> be >>>>>>>>>>>>> help it. >>>>>>>>>>>>> >>>>>>>>>>>>> this gives: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> [root at ceph003 ~]# ceph-kvstore-tool /var/lib/ceph/osd/ceph-67/ >>>>>>>>>>>> current/ >>>>>>>>>>>> list >>>>>>>>>>>> _GHOBJTOSEQ_ | grep 5560_object789734 -A 100 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011BDA6!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object789734!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011C027!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object1330170!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011C6FD!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_4919_object227366!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011CB03!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object1363631!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011CDF0!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object1573957!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011D02C!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object1019282!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011E2B5!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object1283563!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011E511!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_4919_object273736!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011E547!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object1170628!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011EAAB!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_4919_object256335!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011F446!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object1484196!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011FC59!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object884178!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001203F3!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object853746!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001208E3!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object36633!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00120B37!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object1235337!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001210B6!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object1661351!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001210CB!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object238126!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012184C!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object339943!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00121916!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object1047094!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001219C1!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object520642!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001222BB!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object639565!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001223AA!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_4919_object231080!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012243C!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object858050!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012289C!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object241796!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00122D28!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_4919_object7462!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00122DFE!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object243798!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00122EFC!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_8961_object109512!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001232D7!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object653973!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001234A3!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object1378169!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00123714!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object512925!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001237D9!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_4919_object23289!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00123854!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object1108852!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00123971!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object704026!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00123F75!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_8961_object250441!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00124083!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object706178!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001240FA!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object316952!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012447D!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object538734!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001244D9!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object789215!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001247CD!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_8961_object265993!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00124897!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object610597!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00124BE4!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object691723!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00124C9B!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object1306135!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00124E1D!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object520580!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012534C!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object659767!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00125A81!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object184060!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00125E77!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object1292867!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00126562!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object1201410!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00126B34!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object1657326!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00127383!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object1269787!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00127396!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object500115!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001277F8!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object394932!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001279DD!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_4919_object252963!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00127B40!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object936811!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00127BAC!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object1481773!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012894E!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object999885!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00128D05!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object943667!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012908A!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object212990!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00129519!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object437596!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00129716!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object1585330!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00129798!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object603505!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001299C9!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object808800!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00129B7A!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object23193!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00129B9A!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object1158397!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012A932!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object542450!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012B77A!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_8961_object195480!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012BE8C!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_4919_object312911!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012BF74!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object1563783!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012C65C!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object1123980!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012C6FE!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_3411_object913!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012CCAD!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object400863!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012CDBB!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object789667!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012D14B!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object1020723!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012D95B!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_8961_object106293!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012E3C8!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object1355526!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012E5B3!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object1491348!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012F2BB!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_8961_object338872!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012F374!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object1337264!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012FBE5!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object1512395!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012FCE3!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_8961_object298610!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012FEB6!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_4919_object120824!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001301CA!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object816326!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00130263!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object777163!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00130529!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object1413173!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001317D9!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object809510!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0013204F!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object471416!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00132400!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object695087!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00132A19!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object591945!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00132BF8!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object302000!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00132F5B!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object1645443!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00133B8B!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object761911!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0013433E!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object1467727!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00134446!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object791960!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00134678!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object677078!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00134A96!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object254923!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001355D0!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_31461_object321528!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00135690!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_4919_object36935!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00135B62!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object1228272!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00135C72!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_4812_object2180!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00135DEE!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object425705!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00136366!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object141569!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00136371!!3!!benchmark_data_ >>>>>>>>>>>> ceph001%ecubone%eos_5560_object564213!head >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> 100 rows seemed true for me. I found the min list objects is 1024. >>>>>>>>>>> Please could you run >>>>>>>>>>> "ceph-kvstore-tool /var/lib/ceph/osd/ceph-67/current/ list >>>>>>>>>>> _GHOBJTOSEQ_| grep 6adb1100 -A 1024" >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I got the output in attachment >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> Or should I run this immediately after the osd is crashed, >>>>>>>>>>>>>> (because >>>>>>>>>>>>>> it >>>>>>>>>>>>>> maybe >>>>>>>>>>>>>> rebalanced? I did already restarted the cluster) >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I don't know if it is related, but before I could all do that, >>>>>>>>>>>>>> I >>>>>>>>>>>>>> had >>>>>>>>>>>>>> to >>>>>>>>>>>>>> fix >>>>>>>>>>>>>> something else: A monitor did run out if disk space, using 8GB >>>>>>>>>>>>>> for >>>>>>>>>>>>>> his >>>>>>>>>>>>>> store.db folder (lot of sst files). Other monitors are also >>>>>>>>>>>>>> near >>>>>>>>>>>>>> that >>>>>>>>>>>>>> level. >>>>>>>>>>>>>> Never had that problem on previous setups before. I recreated a >>>>>>>>>>>>>> monitor >>>>>>>>>>>>>> and >>>>>>>>>>>>>> now it uses 3.8GB. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> It exists some duplicate data which needed to be compacted. >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> Another idea, maybe you can make KeyValueStore's stripe size >>>>>>>>>>>>> align >>>>>>>>>>>>> with EC stripe size. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> How can I do that? Is there some documentation about that? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ceph --show-config | grep keyvaluestore >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> debug_keyvaluestore = 0/0 >>>>>>>>>>> keyvaluestore_queue_max_ops = 50 >>>>>>>>>>> keyvaluestore_queue_max_bytes = 104857600 >>>>>>>>>>> keyvaluestore_debug_check_backend = false >>>>>>>>>>> keyvaluestore_op_threads = 2 >>>>>>>>>>> keyvaluestore_op_thread_timeout = 60 >>>>>>>>>>> keyvaluestore_op_thread_suicide_timeout = 180 >>>>>>>>>>> keyvaluestore_default_strip_size = 4096 >>>>>>>>>>> keyvaluestore_max_expected_write_size = 16777216 >>>>>>>>>>> keyvaluestore_header_cache_size = 4096 >>>>>>>>>>> keyvaluestore_backend = leveldb >>>>>>>>>>> >>>>>>>>>>> keyvaluestore_default_strip_size is the wanted >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I haven't think deeply and maybe I will try it later. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Kenneth >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> ----- Message from Sage Weil <sweil at redhat.com> --------- >>>>>>>>>>>>>> Date: Fri, 15 Aug 2014 06:10:34 -0700 (PDT) >>>>>>>>>>>>>> From: Sage Weil <sweil at redhat.com> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Subject: Re: ceph cluster inconsistency? >>>>>>>>>>>>>> To: Haomai Wang <haomaiwang at gmail.com> >>>>>>>>>>>>>> Cc: Kenneth Waegeman <Kenneth.Waegeman at ugent.be>, >>>>>>>>>>>>>> ceph-users at lists.ceph.com >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, 15 Aug 2014, Haomai Wang wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Kenneth, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I don't find valuable info in your logs, it lack of the >>>>>>>>>>>>>>>> necessary >>>>>>>>>>>>>>>> debug output when accessing crash code. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> But I scan the encode/decode implementation in >>>>>>>>>>>>>>>> GenericObjectMap >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> find something bad. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> For example, two oid has same hash and their name is: >>>>>>>>>>>>>>>> A: "rb.data.123" >>>>>>>>>>>>>>>> B: "rb-123" >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In ghobject_t compare level, A < B. But GenericObjectMap >>>>>>>>>>>>>>>> encode >>>>>>>>>>>>>>>> "." >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> "%e", so the key in DB is: >>>>>>>>>>>>>>>> A: _GHOBJTOSEQ_:blah!51615000!!none!!rb%edata%e123!head >>>>>>>>>>>>>>>> B: _GHOBJTOSEQ_:blah!51615000!!none!!rb-123!head >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> A > B >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> And it seemed that the escape function is useless and should >>>>>>>>>>>>>>>> be >>>>>>>>>>>>>>>> disabled. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'm not sure whether Kenneth's problem is touching this bug. >>>>>>>>>>>>>>>> Because >>>>>>>>>>>>>>>> this scene only occur when the object set is very large and >>>>>>>>>>>>>>>> make >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> two object has same hash value. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Kenneth, could you have time to run "ceph-kv-store >>>>>>>>>>>>>>>> [path-to-osd] >>>>>>>>>>>>>>>> list >>>>>>>>>>>>>>>> _GHOBJTOSEQ_| grep 6adb1100 -A 100". ceph-kv-store is a debug >>>>>>>>>>>>>>>> tool >>>>>>>>>>>>>>>> which can be compiled from source. You can clone ceph repo >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> run >>>>>>>>>>>>>>>> "./authongen.sh; ./configure; cd src; make >>>>>>>>>>>>>>>> ceph-kvstore-tool". >>>>>>>>>>>>>>>> "path-to-osd" should be "/var/lib/ceph/osd-[id]/current/". >>>>>>>>>>>>>>>> "6adb1100" >>>>>>>>>>>>>>>> is from your verbose log and the next 100 rows should know >>>>>>>>>>>>>>>> necessary >>>>>>>>>>>>>>>> infos. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> You can also get ceph-kvstore-tool from the 'ceph-tests' >>>>>>>>>>>>>>> package. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi sage, do you think we need to provided with upgrade >>>>>>>>>>>>>>> function >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> fix >>>>>>>>>>>>>>>> it? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hmm, we might. This only affects the key/value encoding >>>>>>>>>>>>>>> right? >>>>>>>>>>>>>>> The >>>>>>>>>>>>>>> FileStore is using its own function to map these to file >>>>>>>>>>>>>>> names? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Can you open a ticket in the tracker for this? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>> sage >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Aug 14, 2014 at 7:36 PM, Kenneth Waegeman >>>>>>>>>>>>>>>> <Kenneth.Waegeman at ugent.be> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ----- Message from Haomai Wang <haomaiwang at gmail.com> >>>>>>>>>>>>>>>>> --------- >>>>>>>>>>>>>>>>> Date: Thu, 14 Aug 2014 19:11:55 +0800 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> From: Haomai Wang <haomaiwang at gmail.com> >>>>>>>>>>>>>>>>> Subject: Re: ceph cluster inconsistency? >>>>>>>>>>>>>>>>> To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Could you add config "debug_keyvaluestore = 20/20" to the >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> crashed >>>>>>>>>>>>>>>>>> osd >>>>>>>>>>>>>>>>>> and replay the command causing crash? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I would like to get more debug infos! Thanks. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I included the log in attachment! >>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, Aug 14, 2014 at 4:41 PM, Kenneth Waegeman >>>>>>>>>>>>>>>>>> <Kenneth.Waegeman at ugent.be> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I have: >>>>>>>>>>>>>>>>>>> osd_objectstore = keyvaluestore-dev >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> in the global section of my ceph.conf >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> [root at ceph002 ~]# ceph osd erasure-code-profile get >>>>>>>>>>>>>>>>>>> profile11 >>>>>>>>>>>>>>>>>>> directory=/usr/lib64/ceph/erasure-code >>>>>>>>>>>>>>>>>>> k=8 >>>>>>>>>>>>>>>>>>> m=3 >>>>>>>>>>>>>>>>>>> plugin=jerasure >>>>>>>>>>>>>>>>>>> ruleset-failure-domain=osd >>>>>>>>>>>>>>>>>>> technique=reed_sol_van >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> the ecdata pool has this as profile >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> pool 3 'ecdata' erasure size 11 min_size 8 crush_ruleset 2 >>>>>>>>>>>>>>>>>>> object_hash >>>>>>>>>>>>>>>>>>> rjenkins pg_num 128 pgp_num 128 last_change 161 flags >>>>>>>>>>>>>>>>>>> hashpspool >>>>>>>>>>>>>>>>>>> stripe_width 4096 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ECrule in crushmap >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> rule ecdata { >>>>>>>>>>>>>>>>>>> ruleset 2 >>>>>>>>>>>>>>>>>>> type erasure >>>>>>>>>>>>>>>>>>> min_size 3 >>>>>>>>>>>>>>>>>>> max_size 20 >>>>>>>>>>>>>>>>>>> step set_chooseleaf_tries 5 >>>>>>>>>>>>>>>>>>> step take default-ec >>>>>>>>>>>>>>>>>>> step choose indep 0 type osd >>>>>>>>>>>>>>>>>>> step emit >>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>> root default-ec { >>>>>>>>>>>>>>>>>>> id -8 # do not change unnecessarily >>>>>>>>>>>>>>>>>>> # weight 140.616 >>>>>>>>>>>>>>>>>>> alg straw >>>>>>>>>>>>>>>>>>> hash 0 # rjenkins1 >>>>>>>>>>>>>>>>>>> item ceph001-ec weight 46.872 >>>>>>>>>>>>>>>>>>> item ceph002-ec weight 46.872 >>>>>>>>>>>>>>>>>>> item ceph003-ec weight 46.872 >>>>>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Cheers! >>>>>>>>>>>>>>>>>>> Kenneth >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ----- Message from Haomai Wang <haomaiwang at gmail.com> >>>>>>>>>>>>>>>>>>> --------- >>>>>>>>>>>>>>>>>>> Date: Thu, 14 Aug 2014 10:07:50 +0800 >>>>>>>>>>>>>>>>>>> From: Haomai Wang <haomaiwang at gmail.com> >>>>>>>>>>>>>>>>>>> Subject: Re: ceph cluster inconsistency? >>>>>>>>>>>>>>>>>>> To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be> >>>>>>>>>>>>>>>>>>> Cc: ceph-users <ceph-users at lists.ceph.com> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi Kenneth, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Could you give your configuration related to EC and >>>>>>>>>>>>>>>>>>>> KeyValueStore? >>>>>>>>>>>>>>>>>>>> Not sure whether it's bug on KeyValueStore >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, Aug 14, 2014 at 12:06 AM, Kenneth Waegeman >>>>>>>>>>>>>>>>>>>> <Kenneth.Waegeman at ugent.be> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I was doing some tests with rados bench on a Erasure >>>>>>>>>>>>>>>>>>>>> Coded >>>>>>>>>>>>>>>>>>>>> pool >>>>>>>>>>>>>>>>>>>>> (using >>>>>>>>>>>>>>>>>>>>> keyvaluestore-dev objectstore) on 0.83, and I see some >>>>>>>>>>>>>>>>>>>>> strangs >>>>>>>>>>>>>>>>>>>>> things: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> [root at ceph001 ~]# ceph status >>>>>>>>>>>>>>>>>>>>> cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d >>>>>>>>>>>>>>>>>>>>> health HEALTH_WARN too few pgs per osd (4 < min 20) >>>>>>>>>>>>>>>>>>>>> monmap e1: 3 mons at >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0, >>>>>>>>>>>>>>>>>>>>> ceph003=10.141.8.182:6789/0}, >>>>>>>>>>>>>>>>>>>>> election epoch 6, quorum 0,1,2 ceph001,ceph002,ceph003 >>>>>>>>>>>>>>>>>>>>> mdsmap e116: 1/1/1 up >>>>>>>>>>>>>>>>>>>>> {0=ceph001.cubone.os=up:active}, >>>>>>>>>>>>>>>>>>>>> 2 >>>>>>>>>>>>>>>>>>>>> up:standby >>>>>>>>>>>>>>>>>>>>> osdmap e292: 78 osds: 78 up, 78 in >>>>>>>>>>>>>>>>>>>>> pgmap v48873: 320 pgs, 4 pools, 15366 GB data, 3841 >>>>>>>>>>>>>>>>>>>>> kobjects >>>>>>>>>>>>>>>>>>>>> 1381 GB used, 129 TB / 131 TB avail >>>>>>>>>>>>>>>>>>>>> 320 active+clean >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> There is around 15T of data, but only 1.3 T usage. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> This is also visible in rados: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> [root at ceph001 ~]# rados df >>>>>>>>>>>>>>>>>>>>> pool name category KB objects >>>>>>>>>>>>>>>>>>>>> clones >>>>>>>>>>>>>>>>>>>>> degraded unfound rd rd KB >>>>>>>>>>>>>>>>>>>>> wr >>>>>>>>>>>>>>>>>>>>> wr >>>>>>>>>>>>>>>>>>>>> KB >>>>>>>>>>>>>>>>>>>>> data - 0 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 0 0 0 0 0 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ecdata - 16113451009 >>>>>>>>>>>>>>>>>>>>> 3933959 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 0 0 1 1 3935632 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 16116850711 >>>>>>>>>>>>>>>>>>>>> metadata - 2 >>>>>>>>>>>>>>>>>>>>> 20 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 0 0 33 36 21 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 8 >>>>>>>>>>>>>>>>>>>>> rbd - 0 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 0 0 0 0 0 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> total used 1448266016 3933979 >>>>>>>>>>>>>>>>>>>>> total avail 139400181016 >>>>>>>>>>>>>>>>>>>>> total space 140848447032 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Another (related?) thing: if I do rados -p ecdata ls, I >>>>>>>>>>>>>>>>>>>>> trigger >>>>>>>>>>>>>>>>>>>>> osd >>>>>>>>>>>>>>>>>>>>> shutdowns (each time): >>>>>>>>>>>>>>>>>>>>> I get a list followed by an error: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_8961_object243839 >>>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object801983 >>>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_31461_object856489 >>>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_8961_object202232 >>>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_4919_object33199 >>>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object807797 >>>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_4919_object74729 >>>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_31461_object1264121 >>>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object1318513 >>>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object1202111 >>>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_31461_object939107 >>>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_31461_object729682 >>>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object122915 >>>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object76521 >>>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object113261 >>>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_31461_object575079 >>>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object671042 >>>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object381146 >>>>>>>>>>>>>>>>>>>>> 2014-08-13 17:57:48.736150 7f65047b5700 0 -- >>>>>>>>>>>>>>>>>>>>> 10.141.8.180:0/1023295 >> >>>>>>>>>>>>>>>>>>>>> 10.141.8.182:6839/4471 pipe(0x7f64fc019b20 sd=5 :0 s=1 >>>>>>>>>>>>>>>>>>>>> pgs=0 >>>>>>>>>>>>>>>>>>>>> cs=0 >>>>>>>>>>>>>>>>>>>>> l=1 >>>>>>>>>>>>>>>>>>>>> c=0x7f64fc019db0).fault >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> And I can see this in the log files: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> -25> 2014-08-13 17:52:56.323908 7f8a97fa4700 1 -- >>>>>>>>>>>>>>>>>>>>> 10.143.8.182:6827/64670 <== osd.57 10.141.8.182:0/15796 >>>>>>>>>>>>>>>>>>>>> 51 >>>>>>>>>>>>>>>>>>>>> ==== >>>>>>>>>>>>>>>>>>>>> osd_ping(ping e220 stamp 2014-08-13 17:52:56.323092) v2 >>>>>>>>>>>>>>>>>>>>> ==== >>>>>>>>>>>>>>>>>>>>> 47+0+0 >>>>>>>>>>>>>>>>>>>>> (3227325175 0 0) 0xf475940 con 0xee89fa0 >>>>>>>>>>>>>>>>>>>>> -24> 2014-08-13 17:52:56.323938 7f8a97fa4700 1 -- >>>>>>>>>>>>>>>>>>>>> 10.143.8.182:6827/64670 --> 10.141.8.182:0/15796 -- >>>>>>>>>>>>>>>>>>>>> osd_ping(ping_reply >>>>>>>>>>>>>>>>>>>>> e220 >>>>>>>>>>>>>>>>>>>>> stamp 2014-08-13 17:52:56.323092) v2 -- ?+0 0xf815b00 >>>>>>>>>>>>>>>>>>>>> con >>>>>>>>>>>>>>>>>>>>> 0xee89fa0 >>>>>>>>>>>>>>>>>>>>> -23> 2014-08-13 17:52:56.324078 7f8a997a7700 1 -- >>>>>>>>>>>>>>>>>>>>> 10.141.8.182:6840/64670 <== osd.57 10.141.8.182:0/15796 >>>>>>>>>>>>>>>>>>>>> 51 >>>>>>>>>>>>>>>>>>>>> ==== >>>>>>>>>>>>>>>>>>>>> osd_ping(ping e220 stamp 2014-08-13 17:52:56.323092) v2 >>>>>>>>>>>>>>>>>>>>> ==== >>>>>>>>>>>>>>>>>>>>> 47+0+0 >>>>>>>>>>>>>>>>>>>>> (3227325175 0 0) 0xf132bc0 con 0xee8a680 >>>>>>>>>>>>>>>>>>>>> -22> 2014-08-13 17:52:56.324111 7f8a997a7700 1 -- >>>>>>>>>>>>>>>>>>>>> 10.141.8.182:6840/64670 --> 10.141.8.182:0/15796 -- >>>>>>>>>>>>>>>>>>>>> osd_ping(ping_reply >>>>>>>>>>>>>>>>>>>>> e220 >>>>>>>>>>>>>>>>>>>>> stamp 2014-08-13 17:52:56.323092) v2 -- ?+0 0xf811a40 >>>>>>>>>>>>>>>>>>>>> con >>>>>>>>>>>>>>>>>>>>> 0xee8a680 >>>>>>>>>>>>>>>>>>>>> -21> 2014-08-13 17:52:56.584461 7f8a997a7700 1 -- >>>>>>>>>>>>>>>>>>>>> 10.141.8.182:6840/64670 <== osd.29 10.143.8.181:0/12142 >>>>>>>>>>>>>>>>>>>>> 47 >>>>>>>>>>>>>>>>>>>>> ==== >>>>>>>>>>>>>>>>>>>>> osd_ping(ping e220 stamp 2014-08-13 17:52:56.583010) v2 >>>>>>>>>>>>>>>>>>>>> ==== >>>>>>>>>>>>>>>>>>>>> 47+0+0 >>>>>>>>>>>>>>>>>>>>> (3355887204 0 0) 0xf655940 con 0xee88b00 >>>>>>>>>>>>>>>>>>>>> -20> 2014-08-13 17:52:56.584486 7f8a997a7700 1 -- >>>>>>>>>>>>>>>>>>>>> 10.141.8.182:6840/64670 --> 10.143.8.181:0/12142 -- >>>>>>>>>>>>>>>>>>>>> osd_ping(ping_reply >>>>>>>>>>>>>>>>>>>>> e220 >>>>>>>>>>>>>>>>>>>>> stamp 2014-08-13 17:52:56.583010) v2 -- ?+0 0xf132bc0 >>>>>>>>>>>>>>>>>>>>> con >>>>>>>>>>>>>>>>>>>>> 0xee88b00 >>>>>>>>>>>>>>>>>>>>> -19> 2014-08-13 17:52:56.584498 7f8a97fa4700 1 -- >>>>>>>>>>>>>>>>>>>>> 10.143.8.182:6827/64670 <== osd.29 10.143.8.181:0/12142 >>>>>>>>>>>>>>>>>>>>> 47 >>>>>>>>>>>>>>>>>>>>> ==== >>>>>>>>>>>>>>>>>>>>> osd_ping(ping e220 stamp 2014-08-13 17:52:56.583010) v2 >>>>>>>>>>>>>>>>>>>>> ==== >>>>>>>>>>>>>>>>>>>>> 47+0+0 >>>>>>>>>>>>>>>>>>>>> (3355887204 0 0) 0xf20e040 con 0xee886e0 >>>>>>>>>>>>>>>>>>>>> -18> 2014-08-13 17:52:56.584526 7f8a97fa4700 1 -- >>>>>>>>>>>>>>>>>>>>> 10.143.8.182:6827/64670 --> 10.143.8.181:0/12142 -- >>>>>>>>>>>>>>>>>>>>> osd_ping(ping_reply >>>>>>>>>>>>>>>>>>>>> e220 >>>>>>>>>>>>>>>>>>>>> stamp 2014-08-13 17:52:56.583010) v2 -- ?+0 0xf475940 >>>>>>>>>>>>>>>>>>>>> con >>>>>>>>>>>>>>>>>>>>> 0xee886e0 >>>>>>>>>>>>>>>>>>>>> -17> 2014-08-13 17:52:56.594448 7f8a798c7700 1 -- >>>>>>>>>>>>>>>>>>>>> 10.141.8.182:6839/64670 >> :/0 pipe(0xec15f00 sd=74 >>>>>>>>>>>>>>>>>>>>> :6839 >>>>>>>>>>>>>>>>>>>>> s=0 >>>>>>>>>>>>>>>>>>>>> pgs=0 >>>>>>>>>>>>>>>>>>>>> cs=0 >>>>>>>>>>>>>>>>>>>>> l=0 >>>>>>>>>>>>>>>>>>>>> c=0xee856a0).accept sd=74 10.141.8.180:47641/0 >>>>>>>>>>>>>>>>>>>>> -16> 2014-08-13 17:52:56.594921 7f8a798c7700 1 -- >>>>>>>>>>>>>>>>>>>>> 10.141.8.182:6839/64670 <== client.7512 >>>>>>>>>>>>>>>>>>>>> 10.141.8.180:0/1018433 >>>>>>>>>>>>>>>>>>>>> 1 >>>>>>>>>>>>>>>>>>>>> ==== >>>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >>>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) v4 ==== 151+0+39 >>>>>>>>>>>>>>>>>>>>> (1972163119 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 4174233976) 0xf3bca40 con 0xee856a0 >>>>>>>>>>>>>>>>>>>>> -15> 2014-08-13 17:52:56.594957 7f8a798c7700 5 -- op >>>>>>>>>>>>>>>>>>>>> tracker >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>>>>>> seq: >>>>>>>>>>>>>>>>>>>>> 299, time: 2014-08-13 17:52:56.594874, event: >>>>>>>>>>>>>>>>>>>>> header_read, >>>>>>>>>>>>>>>>>>>>> op: >>>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >>>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >>>>>>>>>>>>>>>>>>>>> -14> 2014-08-13 17:52:56.594970 7f8a798c7700 5 -- op >>>>>>>>>>>>>>>>>>>>> tracker >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>>>>>> seq: >>>>>>>>>>>>>>>>>>>>> 299, time: 2014-08-13 17:52:56.594880, event: throttled, >>>>>>>>>>>>>>>>>>>>> op: >>>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >>>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >>>>>>>>>>>>>>>>>>>>> -13> 2014-08-13 17:52:56.594978 7f8a798c7700 5 -- op >>>>>>>>>>>>>>>>>>>>> tracker >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>>>>>> seq: >>>>>>>>>>>>>>>>>>>>> 299, time: 2014-08-13 17:52:56.594917, event: all_read, >>>>>>>>>>>>>>>>>>>>> op: >>>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >>>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >>>>>>>>>>>>>>>>>>>>> -12> 2014-08-13 17:52:56.594986 7f8a798c7700 5 -- op >>>>>>>>>>>>>>>>>>>>> tracker >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>>>>>> seq: >>>>>>>>>>>>>>>>>>>>> 299, time: 0.000000, event: dispatched, op: >>>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:1 >>>>>>>>>>>>>>>>>>>>> [pgls >>>>>>>>>>>>>>>>>>>>> start_epoch 0] 3.0 ack+read+known_if_redirected e220) >>>>>>>>>>>>>>>>>>>>> -11> 2014-08-13 17:52:56.595127 7f8a90795700 5 -- op >>>>>>>>>>>>>>>>>>>>> tracker >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>>>>>> seq: >>>>>>>>>>>>>>>>>>>>> 299, time: 2014-08-13 17:52:56.595104, event: >>>>>>>>>>>>>>>>>>>>> reached_pg, >>>>>>>>>>>>>>>>>>>>> op: >>>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >>>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >>>>>>>>>>>>>>>>>>>>> -10> 2014-08-13 17:52:56.595159 7f8a90795700 5 -- op >>>>>>>>>>>>>>>>>>>>> tracker >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>>>>>> seq: >>>>>>>>>>>>>>>>>>>>> 299, time: 2014-08-13 17:52:56.595153, event: started, >>>>>>>>>>>>>>>>>>>>> op: >>>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >>>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >>>>>>>>>>>>>>>>>>>>> -9> 2014-08-13 17:52:56.602179 7f8a90795700 1 -- >>>>>>>>>>>>>>>>>>>>> 10.141.8.182:6839/64670 --> 10.141.8.180:0/1018433 -- >>>>>>>>>>>>>>>>>>>>> osd_op_reply(1 >>>>>>>>>>>>>>>>>>>>> [pgls >>>>>>>>>>>>>>>>>>>>> start_epoch 0] v164'30654 uv30654 ondisk = 0) v6 -- ?+0 >>>>>>>>>>>>>>>>>>>>> 0xec16180 >>>>>>>>>>>>>>>>>>>>> con >>>>>>>>>>>>>>>>>>>>> 0xee856a0 >>>>>>>>>>>>>>>>>>>>> -8> 2014-08-13 17:52:56.602211 7f8a90795700 5 -- op >>>>>>>>>>>>>>>>>>>>> tracker >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>>>>>> seq: >>>>>>>>>>>>>>>>>>>>> 299, time: 2014-08-13 17:52:56.602205, event: done, op: >>>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >>>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >>>>>>>>>>>>>>>>>>>>> -7> 2014-08-13 17:52:56.614839 7f8a798c7700 1 -- >>>>>>>>>>>>>>>>>>>>> 10.141.8.182:6839/64670 <== client.7512 >>>>>>>>>>>>>>>>>>>>> 10.141.8.180:0/1018433 >>>>>>>>>>>>>>>>>>>>> 2 >>>>>>>>>>>>>>>>>>>>> ==== >>>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >>>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) v4 ==== 151+0+89 >>>>>>>>>>>>>>>>>>>>> (3460833343 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 2600845095) 0xf3bcec0 con 0xee856a0 >>>>>>>>>>>>>>>>>>>>> -6> 2014-08-13 17:52:56.614864 7f8a798c7700 5 -- op >>>>>>>>>>>>>>>>>>>>> tracker >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>>>>>> seq: >>>>>>>>>>>>>>>>>>>>> 300, time: 2014-08-13 17:52:56.614789, event: >>>>>>>>>>>>>>>>>>>>> header_read, >>>>>>>>>>>>>>>>>>>>> op: >>>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >>>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >>>>>>>>>>>>>>>>>>>>> -5> 2014-08-13 17:52:56.614874 7f8a798c7700 5 -- op >>>>>>>>>>>>>>>>>>>>> tracker >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>>>>>> seq: >>>>>>>>>>>>>>>>>>>>> 300, time: 2014-08-13 17:52:56.614792, event: throttled, >>>>>>>>>>>>>>>>>>>>> op: >>>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >>>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >>>>>>>>>>>>>>>>>>>>> -4> 2014-08-13 17:52:56.614884 7f8a798c7700 5 -- op >>>>>>>>>>>>>>>>>>>>> tracker >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>>>>>> seq: >>>>>>>>>>>>>>>>>>>>> 300, time: 2014-08-13 17:52:56.614835, event: all_read, >>>>>>>>>>>>>>>>>>>>> op: >>>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >>>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >>>>>>>>>>>>>>>>>>>>> -3> 2014-08-13 17:52:56.614891 7f8a798c7700 5 -- op >>>>>>>>>>>>>>>>>>>>> tracker >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>>>>>> seq: >>>>>>>>>>>>>>>>>>>>> 300, time: 0.000000, event: dispatched, op: >>>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:2 >>>>>>>>>>>>>>>>>>>>> [pgls >>>>>>>>>>>>>>>>>>>>> start_epoch 220] 3.0 ack+read+known_if_redirected e220) >>>>>>>>>>>>>>>>>>>>> -2> 2014-08-13 17:52:56.614972 7f8a92f9a700 5 -- op >>>>>>>>>>>>>>>>>>>>> tracker >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>>>>>> seq: >>>>>>>>>>>>>>>>>>>>> 300, time: 2014-08-13 17:52:56.614958, event: >>>>>>>>>>>>>>>>>>>>> reached_pg, >>>>>>>>>>>>>>>>>>>>> op: >>>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >>>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >>>>>>>>>>>>>>>>>>>>> -1> 2014-08-13 17:52:56.614993 7f8a92f9a700 5 -- op >>>>>>>>>>>>>>>>>>>>> tracker >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>>>>>> seq: >>>>>>>>>>>>>>>>>>>>> 300, time: 2014-08-13 17:52:56.614986, event: started, >>>>>>>>>>>>>>>>>>>>> op: >>>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >>>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >>>>>>>>>>>>>>>>>>>>> 0> 2014-08-13 17:52:56.617087 7f8a92f9a700 -1 >>>>>>>>>>>>>>>>>>>>> os/GenericObjectMap.cc: >>>>>>>>>>>>>>>>>>>>> In function 'int GenericObjectMap::list_objects(const >>>>>>>>>>>>>>>>>>>>> coll_t&, >>>>>>>>>>>>>>>>>>>>> ghobject_t, >>>>>>>>>>>>>>>>>>>>> int, std::vector<ghobject_t>*, ghobject_t*)' thread >>>>>>>>>>>>>>>>>>>>> 7f8a92f9a700 >>>>>>>>>>>>>>>>>>>>> time >>>>>>>>>>>>>>>>>>>>> 2014-08-13 17:52:56.615073 >>>>>>>>>>>>>>>>>>>>> os/GenericObjectMap.cc: 1118: FAILED assert(start <= >>>>>>>>>>>>>>>>>>>>> header.oid) >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ceph version 0.83 (78ff1f0a5dfd3c5850805b40217385 >>>>>>>>>>>>>>>>>>>>> 64c36c92b8) >>>>>>>>>>>>>>>>>>>>> 1: (GenericObjectMap::list_objects(coll_t const&, >>>>>>>>>>>>>>>>>>>>> ghobject_t, >>>>>>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>>>>>> std::vector<ghobject_t, std::allocator<ghobject_t> >*, >>>>>>>>>>>>>>>>>>>>> ghobject_t*)+0x474) >>>>>>>>>>>>>>>>>>>>> [0x98f774] >>>>>>>>>>>>>>>>>>>>> 2: (KeyValueStore::collection_list_partial(coll_t, >>>>>>>>>>>>>>>>>>>>> ghobject_t, >>>>>>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>>>>>> snapid_t, std::vector<ghobject_t, >>>>>>>>>>>>>>>>>>>>> std::allocator<ghobject_t> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> *, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ghobject_t*)+0x274) [0x8c5b54] >>>>>>>>>>>>>>>>>>>>> 3: (PGBackend::objects_list_partial(hobject_t const&, >>>>>>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>>>>>> snapid_t, >>>>>>>>>>>>>>>>>>>>> std::vector<hobject_t, std::allocator<hobject_t> >*, >>>>>>>>>>>>>>>>>>>>> hobject_t*)+0x1c9) >>>>>>>>>>>>>>>>>>>>> [0x862de9] >>>>>>>>>>>>>>>>>>>>> 4: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> (ReplicatedPG::do_pg_op(std::tr1::shared_ptr<OpRequest>)+ >>>>>>>>>>>>>>>>>>>>> 0xea5) >>>>>>>>>>>>>>>>>>>>> [0x7f67f5] >>>>>>>>>>>>>>>>>>>>> 5: >>>>>>>>>>>>>>>>>>>>> (ReplicatedPG::do_op(std::tr1: >>>>>>>>>>>>>>>>>>>>> :shared_ptr<OpRequest>)+0x1f3) >>>>>>>>>>>>>>>>>>>>> [0x8177b3] >>>>>>>>>>>>>>>>>>>>> 6: (ReplicatedPG::do_request(std: >>>>>>>>>>>>>>>>>>>>> :tr1::shared_ptr<OpRequest>, >>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x5d5) [0x7b8045] >>>>>>>>>>>>>>>>>>>>> 7: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>>>>> std::tr1::shared_ptr<OpRequest>, >>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x47d) >>>>>>>>>>>>>>>>>>>>> [0x62bf8d] >>>>>>>>>>>>>>>>>>>>> 8: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x35c) [0x62c56c] >>>>>>>>>>>>>>>>>>>>> 9: (ShardedThreadPool::shardedthreadpool_worker(unsigned >>>>>>>>>>>>>>>>>>>>> int)+0x8cd) >>>>>>>>>>>>>>>>>>>>> [0xa776fd] >>>>>>>>>>>>>>>>>>>>> 10: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) >>>>>>>>>>>>>>>>>>>>> [0xa79980] >>>>>>>>>>>>>>>>>>>>> 11: (()+0x7df3) [0x7f8aac71fdf3] >>>>>>>>>>>>>>>>>>>>> 12: (clone()+0x6d) [0x7f8aab1963dd] >>>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS >>>>>>>>>>>>>>>>>>>>> <executable>` >>>>>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>>> needed >>>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>> interpret this. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ceph version 0.83 (78ff1f0a5dfd3c5850805b40217385 >>>>>>>>>>>>>>>>>>>>> 64c36c92b8) >>>>>>>>>>>>>>>>>>>>> 1: /usr/bin/ceph-osd() [0x99b466] >>>>>>>>>>>>>>>>>>>>> 2: (()+0xf130) [0x7f8aac727130] >>>>>>>>>>>>>>>>>>>>> 3: (gsignal()+0x39) [0x7f8aab0d5989] >>>>>>>>>>>>>>>>>>>>> 4: (abort()+0x148) [0x7f8aab0d7098] >>>>>>>>>>>>>>>>>>>>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) >>>>>>>>>>>>>>>>>>>>> [0x7f8aab9e89d5] >>>>>>>>>>>>>>>>>>>>> 6: (()+0x5e946) [0x7f8aab9e6946] >>>>>>>>>>>>>>>>>>>>> 7: (()+0x5e973) [0x7f8aab9e6973] >>>>>>>>>>>>>>>>>>>>> 8: (()+0x5eb9f) [0x7f8aab9e6b9f] >>>>>>>>>>>>>>>>>>>>> 9: (ceph::__ceph_assert_fail(char const*, char const*, >>>>>>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>>>>>> char >>>>>>>>>>>>>>>>>>>>> const*)+0x1ef) [0xa8805f] >>>>>>>>>>>>>>>>>>>>> 10: (GenericObjectMap::list_objects(coll_t const&, >>>>>>>>>>>>>>>>>>>>> ghobject_t, >>>>>>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>>>>>> std::vector<ghobject_t, std::allocator<ghobject_t> >*, >>>>>>>>>>>>>>>>>>>>> ghobject_t*)+0x474) >>>>>>>>>>>>>>>>>>>>> [0x98f774] >>>>>>>>>>>>>>>>>>>>> 11: (KeyValueStore::collection_list_partial(coll_t, >>>>>>>>>>>>>>>>>>>>> ghobject_t, >>>>>>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>>>>>> snapid_t, std::vector<ghobject_t, >>>>>>>>>>>>>>>>>>>>> std::allocator<ghobject_t> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> *, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ghobject_t*)+0x274) [0x8c5b54] >>>>>>>>>>>>>>>>>>>>> 12: (PGBackend::objects_list_partial(hobject_t const&, >>>>>>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>>>>>> snapid_t, >>>>>>>>>>>>>>>>>>>>> std::vector<hobject_t, std::allocator<hobject_t> >*, >>>>>>>>>>>>>>>>>>>>> hobject_t*)+0x1c9) >>>>>>>>>>>>>>>>>>>>> [0x862de9] >>>>>>>>>>>>>>>>>>>>> 13: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> (ReplicatedPG::do_pg_op(std::tr1::shared_ptr<OpRequest>)+ >>>>>>>>>>>>>>>>>>>>> 0xea5) >>>>>>>>>>>>>>>>>>>>> [0x7f67f5] >>>>>>>>>>>>>>>>>>>>> 14: >>>>>>>>>>>>>>>>>>>>> (ReplicatedPG::do_op(std::tr1: >>>>>>>>>>>>>>>>>>>>> :shared_ptr<OpRequest>)+0x1f3) >>>>>>>>>>>>>>>>>>>>> [0x8177b3] >>>>>>>>>>>>>>>>>>>>> 15: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, >>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x5d5) [0x7b8045] >>>>>>>>>>>>>>>>>>>>> 16: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>>>>> std::tr1::shared_ptr<OpRequest>, >>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x47d) >>>>>>>>>>>>>>>>>>>>> [0x62bf8d] >>>>>>>>>>>>>>>>>>>>> 17: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x35c) [0x62c56c] >>>>>>>>>>>>>>>>>>>>> 18: >>>>>>>>>>>>>>>>>>>>> (ShardedThreadPool::shardedthreadpool_worker(unsigned >>>>>>>>>>>>>>>>>>>>> int)+0x8cd) >>>>>>>>>>>>>>>>>>>>> [0xa776fd] >>>>>>>>>>>>>>>>>>>>> 19: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) >>>>>>>>>>>>>>>>>>>>> [0xa79980] >>>>>>>>>>>>>>>>>>>>> 20: (()+0x7df3) [0x7f8aac71fdf3] >>>>>>>>>>>>>>>>>>>>> 21: (clone()+0x6d) [0x7f8aab1963dd] >>>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS >>>>>>>>>>>>>>>>>>>>> <executable>` >>>>>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>>> needed >>>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>> interpret this. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> --- begin dump of recent events --- >>>>>>>>>>>>>>>>>>>>> 0> 2014-08-13 17:52:56.714214 7f8a92f9a700 -1 *** >>>>>>>>>>>>>>>>>>>>> Caught >>>>>>>>>>>>>>>>>>>>> signal >>>>>>>>>>>>>>>>>>>>> (Aborted) ** >>>>>>>>>>>>>>>>>>>>> in thread 7f8a92f9a700 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ceph version 0.83 (78ff1f0a5dfd3c5850805b40217385 >>>>>>>>>>>>>>>>>>>>> 64c36c92b8) >>>>>>>>>>>>>>>>>>>>> 1: /usr/bin/ceph-osd() [0x99b466] >>>>>>>>>>>>>>>>>>>>> 2: (()+0xf130) [0x7f8aac727130] >>>>>>>>>>>>>>>>>>>>> 3: (gsignal()+0x39) [0x7f8aab0d5989] >>>>>>>>>>>>>>>>>>>>> 4: (abort()+0x148) [0x7f8aab0d7098] >>>>>>>>>>>>>>>>>>>>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) >>>>>>>>>>>>>>>>>>>>> [0x7f8aab9e89d5] >>>>>>>>>>>>>>>>>>>>> 6: (()+0x5e946) [0x7f8aab9e6946] >>>>>>>>>>>>>>>>>>>>> 7: (()+0x5e973) [0x7f8aab9e6973] >>>>>>>>>>>>>>>>>>>>> 8: (()+0x5eb9f) [0x7f8aab9e6b9f] >>>>>>>>>>>>>>>>>>>>> 9: (ceph::__ceph_assert_fail(char const*, char const*, >>>>>>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>>>>>> char >>>>>>>>>>>>>>>>>>>>> const*)+0x1ef) [0xa8805f] >>>>>>>>>>>>>>>>>>>>> 10: (GenericObjectMap::list_objects(coll_t const&, >>>>>>>>>>>>>>>>>>>>> ghobject_t, >>>>>>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>>>>>> std::vector<ghobject_t, std::allocator<ghobject_t> >*, >>>>>>>>>>>>>>>>>>>>> ghobject_t*)+0x474) >>>>>>>>>>>>>>>>>>>>> [0x98f774] >>>>>>>>>>>>>>>>>>>>> 11: (KeyValueStore::collection_list_partial(coll_t, >>>>>>>>>>>>>>>>>>>>> ghobject_t, >>>>>>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>>>>>> snapid_t, std::vector<ghobject_t, >>>>>>>>>>>>>>>>>>>>> std::allocator<ghobject_t> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> *, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ghobject_t*)+0x274) [0x8c5b54] >>>>>>>>>>>>>>>>>>>>> 12: (PGBackend::objects_list_partial(hobject_t const&, >>>>>>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>>>>>> int, >>>>>>>>>>>>>>>>>>>>> snapid_t, >>>>>>>>>>>>>>>>>>>>> std::vector<hobject_t, std::allocator<hobject_t> >*, >>>>>>>>>>>>>>>>>>>>> hobject_t*)+0x1c9) >>>>>>>>>>>>>>>>>>>>> [0x862de9] >>>>>>>>>>>>>>>>>>>>> 13: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> (ReplicatedPG::do_pg_op(std::tr1::shared_ptr<OpRequest>)+ >>>>>>>>>>>>>>>>>>>>> 0xea5) >>>>>>>>>>>>>>>>>>>>> [0x7f67f5] >>>>>>>>>>>>>>>>>>>>> 14: >>>>>>>>>>>>>>>>>>>>> (ReplicatedPG::do_op(std::tr1: >>>>>>>>>>>>>>>>>>>>> :shared_ptr<OpRequest>)+0x1f3) >>>>>>>>>>>>>>>>>>>>> [0x8177b3] >>>>>>>>>>>>>>>>>>>>> 15: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, >>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x5d5) [0x7b8045] >>>>>>>>>>>>>>>>>>>>> 16: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>>>>> std::tr1::shared_ptr<OpRequest>, >>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x47d) >>>>>>>>>>>>>>>>>>>>> [0x62bf8d] >>>>>>>>>>>>>>>>>>>>> 17: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x35c) [0x62c56c] >>>>>>>>>>>>>>>>>>>>> 18: >>>>>>>>>>>>>>>>>>>>> (ShardedThreadPool::shardedthreadpool_worker(unsigned >>>>>>>>>>>>>>>>>>>>> int)+0x8cd) >>>>>>>>>>>>>>>>>>>>> [0xa776fd] >>>>>>>>>>>>>>>>>>>>> 19: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) >>>>>>>>>>>>>>>>>>>>> [0xa79980] >>>>>>>>>>>>>>>>>>>>> 20: (()+0x7df3) [0x7f8aac71fdf3] >>>>>>>>>>>>>>>>>>>>> 21: (clone()+0x6d) [0x7f8aab1963dd] >>>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS >>>>>>>>>>>>>>>>>>>>> <executable>` >>>>>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>>> needed >>>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>> interpret this. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I guess this has something to do with using the dev >>>>>>>>>>>>>>>>>>>>> Keyvaluestore? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Kenneth >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>> ceph-users mailing list >>>>>>>>>>>>>>>>>>>>> ceph-users at lists.ceph.com >>>>>>>>>>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Wheat >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ----- End message from Haomai Wang <haomaiwang at gmail.com> >>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Met vriendelijke groeten, >>>>>>>>>>>>>>>>>>> Kenneth Waegeman >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Wheat >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ----- End message from Haomai Wang <haomaiwang at gmail.com> >>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Met vriendelijke groeten, >>>>>>>>>>>>>>>>> Kenneth Waegeman >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Wheat >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> ceph-users mailing list >>>>>>>>>>>>>>>> ceph-users at lists.ceph.com >>>>>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> ----- End message from Sage Weil <sweil at redhat.com> ----- >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> >>>>>>>>>>>>>> Met vriendelijke groeten, >> >> >>>>>>>>>>>>>> Kenneth Waegeman >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>> >>>>>>>>>>>>> Wheat >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ----- End message from Haomai Wang <haomaiwang at gmail.com> ----- >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> >>>>>>>>>>>> Met vriendelijke groeten, >>>>>>>>>>>> Kenneth Waegeman >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Best Regards, >>>>>>>>>>> >>>>>>>>>>> Wheat >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ----- End message from Haomai Wang <haomaiwang at gmail.com> ----- >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> Met vriendelijke groeten, >>>>>>>>>> Kenneth Waegeman >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Best Regards, >>>>>>>>> >>>>>>>>> Wheat >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Best Regards, >>>>>>>> >>>>>>>> Wheat >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ----- End message from Haomai Wang <haomaiwang at gmail.com> ----- >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Met vriendelijke groeten, >>>>>>> Kenneth Waegeman >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best Regards, >>>>>> >>>>>> Wheat >>>>>> >>>>> >>>>> >>>>> ----- End message from Haomai Wang <haomaiwang at gmail.com> ----- >>>>> >>>>> -- >>>>> >>>>> Met vriendelijke groeten, >>>>> Kenneth Waegeman >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> >>>> Best Regards, >>>> >>>> Wheat >>> >>> >>> >>> ----- End message from Haomai Wang <haomaiwang at gmail.com> ----- >>> >>> -- >>> >>> Met vriendelijke groeten, >>> Kenneth Waegeman >> >> >> >> ----- End message from Kenneth Waegeman <Kenneth.Waegeman at UGent.be> ----- >> >> >> -- >> >> Met vriendelijke groeten, >> Kenneth Waegeman >> > > > > -- > Best Regards, > > Wheat ----- End message from Haomai Wang <haomaiwang at gmail.com> ----- -- Met vriendelijke groeten, Kenneth Waegeman