A issue already registered at http://tracker.ceph.com/issues/8589 On Sun, Sep 7, 2014 at 8:00 PM, Haomai Wang <haomaiwang at gmail.com> wrote: > I have found the root cause. It's a bug. > > When chunky scrub happen, it will iterate the who pg's objects and each > iterator only a few objects will be scan. > > osd/PG.cc:3758 > ret = get_pgbackend()-> objects_list_partial( > start, > cct->_conf->osd_scrub_chunk_min, > cct->_conf->osd_scrub_chunk_max, > 0, > &objects, > &candidate_end); > > candidate_end is the end of object set and it's used to indicate the next > scrub process's start position. But it will be truncated: > > osd/PG.cc:3777 > while (!boundary_found && objects.size() > 1) { > hobject_t end = objects.back().get_boundary(); > objects.pop_back(); > > if (objects.back().get_filestore_key() != > end.get_filestore_key()) { > candidate_end = end; > boundary_found = true; > } > } > end which only contain "hash" field as hobject_t will be assign to > candidate_end. So the next scrub process a hobject_t only contains "hash" > field will be passed in to get_pgbackend()-> objects_list_partial. > > It will cause incorrect results for KeyValueStore backend. Because it will > use strict key ordering for "collection_list_paritial" method. A hobject_t > only contains "hash" field will be: > > 1%e79s0_head!972F1B5D!!none!!!00000000000000000000!0!0 > > and the actual object is > 1%e79s0_head!972F1B5D!!1!!!object-name!head > > In other word, a object only contain "hash" field can't used by to search > a absolute object has the same "hash" field. > > @sage, I simply scan the usage of "get_boundary" and can't find the > reason. Could we simply remove it and the results will be: > > while (!boundary_found && objects.size() > 1) { > hobject_t end = objects.back(); > objects.pop_back(); > > if (objects.back().get_filestore_key() != > end.get_filestore_key()) { > candidate_end = end; > boundary_found = true; > } > } > > > > On Sat, Sep 6, 2014 at 10:44 PM, Haomai Wang <haomaiwang at gmail.com> wrote: > >> Sorry for the late message, I'm back from a short vacation. I would >> like to try it this weekends. Thanks for your patient :-) >> >> On Wed, Sep 3, 2014 at 9:16 PM, Kenneth Waegeman >> <Kenneth.Waegeman at ugent.be> wrote: >> > I also can reproduce it on a new slightly different set up (also EC on >> KV >> > and Cache) by running ceph pg scrub on a KV pg: this pg will then get >> the >> > 'inconsistent' status >> > >> > >> > >> > ----- Message from Kenneth Waegeman <Kenneth.Waegeman at UGent.be> >> --------- >> > Date: Mon, 01 Sep 2014 16:28:31 +0200 >> > From: Kenneth Waegeman <Kenneth.Waegeman at UGent.be> >> > Subject: Re: ceph cluster inconsistency keyvaluestore >> > To: Haomai Wang <haomaiwang at gmail.com> >> > Cc: ceph-users at lists.ceph.com >> > >> > >> > >> >> Hi, >> >> >> >> >> >> The cluster got installed with quattor, which uses ceph-deploy for >> >> installation of daemons, writes the config file and installs the >> crushmap. >> >> I have 3 hosts, each 12 disks, having a large KV partition (3.6T) for >> the >> >> ECdata pool and a small cache partition (50G) for the cache >> >> >> >> I manually did this: >> >> >> >> ceph osd pool create cache 1024 1024 >> >> ceph osd pool set cache size 2 >> >> ceph osd pool set cache min_size 1 >> >> ceph osd erasure-code-profile set profile11 k=8 m=3 >> >> ruleset-failure-domain=osd >> >> ceph osd pool create ecdata 128 128 erasure profile11 >> >> ceph osd tier add ecdata cache >> >> ceph osd tier cache-mode cache writeback >> >> ceph osd tier set-overlay ecdata cache >> >> ceph osd pool set cache hit_set_type bloom >> >> ceph osd pool set cache hit_set_count 1 >> >> ceph osd pool set cache hit_set_period 3600 >> >> ceph osd pool set cache target_max_bytes $((280*1024*1024*1024)) >> >> >> >> (But the previous time I had the problem already without the cache >> part) >> >> >> >> >> >> >> >> Cluster live since 2014-08-29 15:34:16 >> >> >> >> Config file on host ceph001: >> >> >> >> [global] >> >> auth_client_required = cephx >> >> auth_cluster_required = cephx >> >> auth_service_required = cephx >> >> cluster_network = 10.143.8.0/24 >> >> filestore_xattr_use_omap = 1 >> >> fsid = 82766e04-585b-49a6-a0ac-c13d9ffd0a7d >> >> mon_cluster_log_to_syslog = 1 >> >> mon_host = ceph001.cubone.os, ceph002.cubone.os, ceph003.cubone.os >> >> mon_initial_members = ceph001, ceph002, ceph003 >> >> osd_crush_update_on_start = 0 >> >> osd_journal_size = 10240 >> >> osd_pool_default_min_size = 2 >> >> osd_pool_default_pg_num = 512 >> >> osd_pool_default_pgp_num = 512 >> >> osd_pool_default_size = 3 >> >> public_network = 10.141.8.0/24 >> >> >> >> [osd.11] >> >> osd_objectstore = keyvaluestore-dev >> >> >> >> [osd.13] >> >> osd_objectstore = keyvaluestore-dev >> >> >> >> [osd.15] >> >> osd_objectstore = keyvaluestore-dev >> >> >> >> [osd.17] >> >> osd_objectstore = keyvaluestore-dev >> >> >> >> [osd.19] >> >> osd_objectstore = keyvaluestore-dev >> >> >> >> [osd.21] >> >> osd_objectstore = keyvaluestore-dev >> >> >> >> [osd.23] >> >> osd_objectstore = keyvaluestore-dev >> >> >> >> [osd.25] >> >> osd_objectstore = keyvaluestore-dev >> >> >> >> [osd.3] >> >> osd_objectstore = keyvaluestore-dev >> >> >> >> [osd.5] >> >> osd_objectstore = keyvaluestore-dev >> >> >> >> [osd.7] >> >> osd_objectstore = keyvaluestore-dev >> >> >> >> [osd.9] >> >> osd_objectstore = keyvaluestore-dev >> >> >> >> >> >> OSDs: >> >> # id weight type name up/down reweight >> >> -12 140.6 root default-cache >> >> -9 46.87 host ceph001-cache >> >> 2 3.906 osd.2 up 1 >> >> 4 3.906 osd.4 up 1 >> >> 6 3.906 osd.6 up 1 >> >> 8 3.906 osd.8 up 1 >> >> 10 3.906 osd.10 up 1 >> >> 12 3.906 osd.12 up 1 >> >> 14 3.906 osd.14 up 1 >> >> 16 3.906 osd.16 up 1 >> >> 18 3.906 osd.18 up 1 >> >> 20 3.906 osd.20 up 1 >> >> 22 3.906 osd.22 up 1 >> >> 24 3.906 osd.24 up 1 >> >> -10 46.87 host ceph002-cache >> >> 28 3.906 osd.28 up 1 >> >> 30 3.906 osd.30 up 1 >> >> 32 3.906 osd.32 up 1 >> >> 34 3.906 osd.34 up 1 >> >> 36 3.906 osd.36 up 1 >> >> 38 3.906 osd.38 up 1 >> >> 40 3.906 osd.40 up 1 >> >> 42 3.906 osd.42 up 1 >> >> 44 3.906 osd.44 up 1 >> >> 46 3.906 osd.46 up 1 >> >> 48 3.906 osd.48 up 1 >> >> 50 3.906 osd.50 up 1 >> >> -11 46.87 host ceph003-cache >> >> 54 3.906 osd.54 up 1 >> >> 56 3.906 osd.56 up 1 >> >> 58 3.906 osd.58 up 1 >> >> 60 3.906 osd.60 up 1 >> >> 62 3.906 osd.62 up 1 >> >> 64 3.906 osd.64 up 1 >> >> 66 3.906 osd.66 up 1 >> >> 68 3.906 osd.68 up 1 >> >> 70 3.906 osd.70 up 1 >> >> 72 3.906 osd.72 up 1 >> >> 74 3.906 osd.74 up 1 >> >> 76 3.906 osd.76 up 1 >> >> -8 140.6 root default-ec >> >> -5 46.87 host ceph001-ec >> >> 3 3.906 osd.3 up 1 >> >> 5 3.906 osd.5 up 1 >> >> 7 3.906 osd.7 up 1 >> >> 9 3.906 osd.9 up 1 >> >> 11 3.906 osd.11 up 1 >> >> 13 3.906 osd.13 up 1 >> >> 15 3.906 osd.15 up 1 >> >> 17 3.906 osd.17 up 1 >> >> 19 3.906 osd.19 up 1 >> >> 21 3.906 osd.21 up 1 >> >> 23 3.906 osd.23 up 1 >> >> 25 3.906 osd.25 up 1 >> >> -6 46.87 host ceph002-ec >> >> 29 3.906 osd.29 up 1 >> >> 31 3.906 osd.31 up 1 >> >> 33 3.906 osd.33 up 1 >> >> 35 3.906 osd.35 up 1 >> >> 37 3.906 osd.37 up 1 >> >> 39 3.906 osd.39 up 1 >> >> 41 3.906 osd.41 up 1 >> >> 43 3.906 osd.43 up 1 >> >> 45 3.906 osd.45 up 1 >> >> 47 3.906 osd.47 up 1 >> >> 49 3.906 osd.49 up 1 >> >> 51 3.906 osd.51 up 1 >> >> -7 46.87 host ceph003-ec >> >> 55 3.906 osd.55 up 1 >> >> 57 3.906 osd.57 up 1 >> >> 59 3.906 osd.59 up 1 >> >> 61 3.906 osd.61 up 1 >> >> 63 3.906 osd.63 up 1 >> >> 65 3.906 osd.65 up 1 >> >> 67 3.906 osd.67 up 1 >> >> 69 3.906 osd.69 up 1 >> >> 71 3.906 osd.71 up 1 >> >> 73 3.906 osd.73 up 1 >> >> 75 3.906 osd.75 up 1 >> >> 77 3.906 osd.77 up 1 >> >> -4 23.44 root default-ssd >> >> -1 7.812 host ceph001-ssd >> >> 0 3.906 osd.0 up 1 >> >> 1 3.906 osd.1 up 1 >> >> -2 7.812 host ceph002-ssd >> >> 26 3.906 osd.26 up 1 >> >> 27 3.906 osd.27 up 1 >> >> -3 7.812 host ceph003-ssd >> >> 52 3.906 osd.52 up 1 >> >> 53 3.906 osd.53 up 1 >> >> >> >> Cache OSDs are each 50G, the EC KV OSDS 3.6T, (ssds not used right now) >> >> >> >> Pools: >> >> pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash >> >> rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool >> stripe_width 0 >> >> pool 1 'cache' replicated size 2 min_size 1 crush_ruleset 0 object_hash >> >> rjenkins pg_num 1024 pgp_num 1024 last_change 174 flags >> >> hashpspool,incomplete_clones tier_of 2 cache_mode writeback >> target_bytes >> >> 300647710720 hit_set bloom{false_positive_probability: 0.05, >> target_size: 0, >> >> seed: 0} 3600s x1 stripe_width 0 >> >> pool 2 'ecdata' erasure size 11 min_size 8 crush_ruleset 2 object_hash >> >> rjenkins pg_num 128 pgp_num 128 last_change 170 lfor 170 flags >> hashpspool >> >> tiers 1 read_tier 1 write_tier 1 stripe_width 4096 >> >> >> >> >> >> Crushmap: >> >> # begin crush map >> >> tunable choose_local_fallback_tries 0 >> >> tunable choose_local_tries 0 >> >> tunable choose_total_tries 50 >> >> tunable chooseleaf_descend_once 1 >> >> >> >> # devices >> >> device 0 osd.0 >> >> device 1 osd.1 >> >> device 2 osd.2 >> >> device 3 osd.3 >> >> device 4 osd.4 >> >> device 5 osd.5 >> >> device 6 osd.6 >> >> device 7 osd.7 >> >> device 8 osd.8 >> >> device 9 osd.9 >> >> device 10 osd.10 >> >> device 11 osd.11 >> >> device 12 osd.12 >> >> device 13 osd.13 >> >> device 14 osd.14 >> >> device 15 osd.15 >> >> device 16 osd.16 >> >> device 17 osd.17 >> >> device 18 osd.18 >> >> device 19 osd.19 >> >> device 20 osd.20 >> >> device 21 osd.21 >> >> device 22 osd.22 >> >> device 23 osd.23 >> >> device 24 osd.24 >> >> device 25 osd.25 >> >> device 26 osd.26 >> >> device 27 osd.27 >> >> device 28 osd.28 >> >> device 29 osd.29 >> >> device 30 osd.30 >> >> device 31 osd.31 >> >> device 32 osd.32 >> >> device 33 osd.33 >> >> device 34 osd.34 >> >> device 35 osd.35 >> >> device 36 osd.36 >> >> device 37 osd.37 >> >> device 38 osd.38 >> >> device 39 osd.39 >> >> device 40 osd.40 >> >> device 41 osd.41 >> >> device 42 osd.42 >> >> device 43 osd.43 >> >> device 44 osd.44 >> >> device 45 osd.45 >> >> device 46 osd.46 >> >> device 47 osd.47 >> >> device 48 osd.48 >> >> device 49 osd.49 >> >> device 50 osd.50 >> >> device 51 osd.51 >> >> device 52 osd.52 >> >> device 53 osd.53 >> >> device 54 osd.54 >> >> device 55 osd.55 >> >> device 56 osd.56 >> >> device 57 osd.57 >> >> device 58 osd.58 >> >> device 59 osd.59 >> >> device 60 osd.60 >> >> device 61 osd.61 >> >> device 62 osd.62 >> >> device 63 osd.63 >> >> device 64 osd.64 >> >> device 65 osd.65 >> >> device 66 osd.66 >> >> device 67 osd.67 >> >> device 68 osd.68 >> >> device 69 osd.69 >> >> device 70 osd.70 >> >> device 71 osd.71 >> >> device 72 osd.72 >> >> device 73 osd.73 >> >> device 74 osd.74 >> >> device 75 osd.75 >> >> device 76 osd.76 >> >> device 77 osd.77 >> >> >> >> # types >> >> type 0 osd >> >> type 1 host >> >> type 2 root >> >> >> >> # buckets >> >> host ceph001-ssd { >> >> id -1 # do not change unnecessarily >> >> # weight 7.812 >> >> alg straw >> >> hash 0 # rjenkins1 >> >> item osd.0 weight 3.906 >> >> item osd.1 weight 3.906 >> >> } >> >> host ceph002-ssd { >> >> id -2 # do not change unnecessarily >> >> # weight 7.812 >> >> alg straw >> >> hash 0 # rjenkins1 >> >> item osd.26 weight 3.906 >> >> item osd.27 weight 3.906 >> >> } >> >> host ceph003-ssd { >> >> id -3 # do not change unnecessarily >> >> # weight 7.812 >> >> alg straw >> >> hash 0 # rjenkins1 >> >> item osd.52 weight 3.906 >> >> item osd.53 weight 3.906 >> >> } >> >> root default-ssd { >> >> id -4 # do not change unnecessarily >> >> # weight 23.436 >> >> alg straw >> >> hash 0 # rjenkins1 >> >> item ceph001-ssd weight 7.812 >> >> item ceph002-ssd weight 7.812 >> >> item ceph003-ssd weight 7.812 >> >> } >> >> host ceph001-ec { >> >> id -5 # do not change unnecessarily >> >> # weight 46.872 >> >> alg straw >> >> hash 0 # rjenkins1 >> >> item osd.3 weight 3.906 >> >> item osd.5 weight 3.906 >> >> item osd.7 weight 3.906 >> >> item osd.9 weight 3.906 >> >> item osd.11 weight 3.906 >> >> item osd.13 weight 3.906 >> >> item osd.15 weight 3.906 >> >> item osd.17 weight 3.906 >> >> item osd.19 weight 3.906 >> >> item osd.21 weight 3.906 >> >> item osd.23 weight 3.906 >> >> item osd.25 weight 3.906 >> >> } >> >> host ceph002-ec { >> >> id -6 # do not change unnecessarily >> >> # weight 46.872 >> >> alg straw >> >> hash 0 # rjenkins1 >> >> item osd.29 weight 3.906 >> >> item osd.31 weight 3.906 >> >> item osd.33 weight 3.906 >> >> item osd.35 weight 3.906 >> >> item osd.37 weight 3.906 >> >> item osd.39 weight 3.906 >> >> item osd.41 weight 3.906 >> >> item osd.43 weight 3.906 >> >> item osd.45 weight 3.906 >> >> item osd.47 weight 3.906 >> >> item osd.49 weight 3.906 >> >> item osd.51 weight 3.906 >> >> } >> >> host ceph003-ec { >> >> id -7 # do not change unnecessarily >> >> # weight 46.872 >> >> alg straw >> >> hash 0 # rjenkins1 >> >> item osd.55 weight 3.906 >> >> item osd.57 weight 3.906 >> >> item osd.59 weight 3.906 >> >> item osd.61 weight 3.906 >> >> item osd.63 weight 3.906 >> >> item osd.65 weight 3.906 >> >> item osd.67 weight 3.906 >> >> item osd.69 weight 3.906 >> >> item osd.71 weight 3.906 >> >> item osd.73 weight 3.906 >> >> item osd.75 weight 3.906 >> >> item osd.77 weight 3.906 >> >> } >> >> root default-ec { >> >> id -8 # do not change unnecessarily >> >> # weight 140.616 >> >> alg straw >> >> hash 0 # rjenkins1 >> >> item ceph001-ec weight 46.872 >> >> item ceph002-ec weight 46.872 >> >> item ceph003-ec weight 46.872 >> >> } >> >> host ceph001-cache { >> >> id -9 # do not change unnecessarily >> >> # weight 46.872 >> >> alg straw >> >> hash 0 # rjenkins1 >> >> item osd.2 weight 3.906 >> >> item osd.4 weight 3.906 >> >> item osd.6 weight 3.906 >> >> item osd.8 weight 3.906 >> >> item osd.10 weight 3.906 >> >> item osd.12 weight 3.906 >> >> item osd.14 weight 3.906 >> >> item osd.16 weight 3.906 >> >> item osd.18 weight 3.906 >> >> item osd.20 weight 3.906 >> >> item osd.22 weight 3.906 >> >> item osd.24 weight 3.906 >> >> } >> >> host ceph002-cache { >> >> id -10 # do not change unnecessarily >> >> # weight 46.872 >> >> alg straw >> >> hash 0 # rjenkins1 >> >> item osd.28 weight 3.906 >> >> item osd.30 weight 3.906 >> >> item osd.32 weight 3.906 >> >> item osd.34 weight 3.906 >> >> item osd.36 weight 3.906 >> >> item osd.38 weight 3.906 >> >> item osd.40 weight 3.906 >> >> item osd.42 weight 3.906 >> >> item osd.44 weight 3.906 >> >> item osd.46 weight 3.906 >> >> item osd.48 weight 3.906 >> >> item osd.50 weight 3.906 >> >> } >> >> host ceph003-cache { >> >> id -11 # do not change unnecessarily >> >> # weight 46.872 >> >> alg straw >> >> hash 0 # rjenkins1 >> >> item osd.54 weight 3.906 >> >> item osd.56 weight 3.906 >> >> item osd.58 weight 3.906 >> >> item osd.60 weight 3.906 >> >> item osd.62 weight 3.906 >> >> item osd.64 weight 3.906 >> >> item osd.66 weight 3.906 >> >> item osd.68 weight 3.906 >> >> item osd.70 weight 3.906 >> >> item osd.72 weight 3.906 >> >> item osd.74 weight 3.906 >> >> item osd.76 weight 3.906 >> >> } >> >> root default-cache { >> >> id -12 # do not change unnecessarily >> >> # weight 140.616 >> >> alg straw >> >> hash 0 # rjenkins1 >> >> item ceph001-cache weight 46.872 >> >> item ceph002-cache weight 46.872 >> >> item ceph003-cache weight 46.872 >> >> } >> >> >> >> # rules >> >> rule cache { >> >> ruleset 0 >> >> type replicated >> >> min_size 1 >> >> max_size 10 >> >> step take default-cache >> >> step chooseleaf firstn 0 type host >> >> step emit >> >> } >> >> rule metadata { >> >> ruleset 1 >> >> type replicated >> >> min_size 1 >> >> max_size 10 >> >> step take default-ssd >> >> step chooseleaf firstn 0 type host >> >> step emit >> >> } >> >> rule ecdata { >> >> ruleset 2 >> >> type erasure >> >> min_size 3 >> >> max_size 20 >> >> step set_chooseleaf_tries 5 >> >> step take default-ec >> >> step choose indep 0 type osd >> >> step emit >> >> } >> >> >> >> # end crush map >> >> >> >> The benchmarks I then did: >> >> >> >> ./benchrw 50000 >> >> >> >> benchrw: >> >> /usr/bin/rados -p ecdata bench $1 write --no-cleanup >> >> /usr/bin/rados -p ecdata bench $1 seq >> >> /usr/bin/rados -p ecdata bench $1 seq & >> >> /usr/bin/rados -p ecdata bench $1 write --no-cleanup >> >> >> >> >> >> Srubbing errors started soon after that: 2014-08-31 10:59:14 >> >> >> >> >> >> Please let me know if you need more information, and thanks ! >> >> >> >> Kenneth >> >> >> >> ----- Message from Haomai Wang <haomaiwang at gmail.com> --------- >> >> Date: Mon, 1 Sep 2014 21:30:16 +0800 >> >> From: Haomai Wang <haomaiwang at gmail.com> >> >> Subject: Re: ceph cluster inconsistency keyvaluestore >> >> To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be> >> >> Cc: ceph-users at lists.ceph.com >> >> >> >> >> >>> Hmm, could you please list your instructions including cluster >> existing >> >>> time and all relevant ops? I want to reproduce it. >> >>> >> >>> >> >>> On Mon, Sep 1, 2014 at 4:45 PM, Kenneth Waegeman >> >>> <Kenneth.Waegeman at ugent.be> >> >>> wrote: >> >>> >> >>>> Hi, >> >>>> >> >>>> I reinstalled the cluster with 0.84, and tried again running rados >> bench >> >>>> on a EC coded pool on keyvaluestore. >> >>>> Nothing crashed this time, but when I check the status: >> >>>> >> >>>> health HEALTH_ERR 128 pgs inconsistent; 128 scrub errors; too few >> >>>> pgs >> >>>> per osd (15 < min 20) >> >>>> monmap e1: 3 mons at {ceph001=10.141.8.180:6789/0, >> >>>> ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0}, election >> epoch >> >>>> 8, quorum 0,1,2 ceph001,ceph002,ceph003 >> >>>> osdmap e174: 78 osds: 78 up, 78 in >> >>>> pgmap v147680: 1216 pgs, 3 pools, 14758 GB data, 3690 kobjects >> >>>> 1753 GB used, 129 TB / 131 TB avail >> >>>> 1088 active+clean >> >>>> 128 active+clean+inconsistent >> >>>> >> >>>> the 128 inconsistent pgs are ALL the pgs of the EC KV store ( the >> others >> >>>> are on Filestore) >> >>>> >> >>>> The only thing I can see in the logs is that after the rados tests, >> it >> >>>> start scrubbing, and for each KV pg I get something like this: >> >>>> >> >>>> 2014-08-31 11:14:09.050747 osd.11 10.141.8.180:6833/61098 4 : [ERR] >> >>>> 2.3s0 >> >>>> scrub stat mismatch, got 28164/29291 objects, 0/0 clones, >> 28164/29291 >> >>>> dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, >> >>>> 118128377856/122855358464 bytes. >> >>>> >> >>>> What could here be the problem? >> >>>> Thanks again!! >> >>>> >> >>>> Kenneth >> >>>> >> >>>> >> >>>> ----- Message from Haomai Wang <haomaiwang at gmail.com> --------- >> >>>> Date: Tue, 26 Aug 2014 17:11:43 +0800 >> >>>> From: Haomai Wang <haomaiwang at gmail.com> >> >>>> Subject: Re: ceph cluster inconsistency? >> >>>> To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be> >> >>>> Cc: ceph-users at lists.ceph.com >> >>>> >> >>>> >> >>>> Hmm, it looks like you hit this >> >>>> bug(http://tracker.ceph.com/issues/9223). >> >>>>> >> >>>>> >> >>>>> Sorry for the late message, I forget that this fix is merged into >> 0.84. >> >>>>> >> >>>>> Thanks for your patient :-) >> >>>>> >> >>>>> On Tue, Aug 26, 2014 at 4:39 PM, Kenneth Waegeman >> >>>>> <Kenneth.Waegeman at ugent.be> wrote: >> >>>>> >> >>>>>> >> >>>>>> Hi, >> >>>>>> >> >>>>>> In the meantime I already tried with upgrading the cluster to >> 0.84, to >> >>>>>> see >> >>>>>> if that made a difference, and it seems it does. >> >>>>>> I can't reproduce the crashing osds by doing a 'rados -p ecdata ls' >> >>>>>> anymore. >> >>>>>> >> >>>>>> But now the cluster detect it is inconsistent: >> >>>>>> >> >>>>>> cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d >> >>>>>> health HEALTH_ERR 40 pgs inconsistent; 40 scrub errors; too >> few >> >>>>>> pgs >> >>>>>> per osd (4 < min 20); mon.ceph002 low disk space >> >>>>>> monmap e3: 3 mons at >> >>>>>> {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0, >> >>>>>> ceph003=10.141.8.182:6789/0}, >> >>>>>> election epoch 30, quorum 0,1,2 ceph001,ceph002,ceph003 >> >>>>>> mdsmap e78951: 1/1/1 up {0=ceph003.cubone.os=up:active}, 3 >> >>>>>> up:standby >> >>>>>> osdmap e145384: 78 osds: 78 up, 78 in >> >>>>>> pgmap v247095: 320 pgs, 4 pools, 15366 GB data, 3841 >> kobjects >> >>>>>> 1502 GB used, 129 TB / 131 TB avail >> >>>>>> 279 active+clean >> >>>>>> 40 active+clean+inconsistent >> >>>>>> 1 active+clean+scrubbing+deep >> >>>>>> >> >>>>>> >> >>>>>> I tried to do ceph pg repair for all the inconsistent pgs: >> >>>>>> >> >>>>>> cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d >> >>>>>> health HEALTH_ERR 40 pgs inconsistent; 1 pgs repair; 40 scrub >> >>>>>> errors; >> >>>>>> too few pgs per osd (4 < min 20); mon.ceph002 low disk space >> >>>>>> monmap e3: 3 mons at >> >>>>>> {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0, >> >>>>>> ceph003=10.141.8.182:6789/0}, >> >>>>>> election epoch 30, quorum 0,1,2 ceph001,ceph002,ceph003 >> >>>>>> mdsmap e79486: 1/1/1 up {0=ceph003.cubone.os=up:active}, 3 >> >>>>>> up:standby >> >>>>>> osdmap e146452: 78 osds: 78 up, 78 in >> >>>>>> pgmap v248520: 320 pgs, 4 pools, 15366 GB data, 3841 >> kobjects >> >>>>>> 1503 GB used, 129 TB / 131 TB avail >> >>>>>> 279 active+clean >> >>>>>> 39 active+clean+inconsistent >> >>>>>> 1 active+clean+scrubbing+deep >> >>>>>> 1 >> active+clean+scrubbing+deep+inconsistent+repair >> >>>>>> >> >>>>>> I let it recovering through the night, but this morning the mons >> were >> >>>>>> all >> >>>>>> gone, nothing to see in the log files.. The osds were all still up! >> >>>>>> >> >>>>>> cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d >> >>>>>> health HEALTH_ERR 36 pgs inconsistent; 1 pgs repair; 36 scrub >> >>>>>> errors; >> >>>>>> too few pgs per osd (4 < min 20) >> >>>>>> monmap e7: 3 mons at >> >>>>>> {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0, >> >>>>>> ceph003=10.141.8.182:6789/0}, >> >>>>>> election epoch 44, quorum 0,1,2 ceph001,ceph002,ceph003 >> >>>>>> mdsmap e109481: 1/1/1 up {0=ceph003.cubone.os=up:active}, 3 >> >>>>>> up:standby >> >>>>>> osdmap e203410: 78 osds: 78 up, 78 in >> >>>>>> pgmap v331747: 320 pgs, 4 pools, 15251 GB data, 3812 kobjects >> >>>>>> 1547 GB used, 129 TB / 131 TB avail >> >>>>>> 1 active+clean+scrubbing+deep+inconsistent+repair >> >>>>>> 284 active+clean >> >>>>>> 35 active+clean+inconsistent >> >>>>>> >> >>>>>> I restarted the monitors now, I will let you know when I see >> something >> >>>>>> more.. >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> ----- Message from Haomai Wang <haomaiwang at gmail.com> --------- >> >>>>>> Date: Sun, 24 Aug 2014 12:51:41 +0800 >> >>>>>> >> >>>>>> From: Haomai Wang <haomaiwang at gmail.com> >> >>>>>> Subject: Re: ceph cluster inconsistency? >> >>>>>> To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be>, >> >>>>>> ceph-users at lists.ceph.com >> >>>>>> >> >>>>>> >> >>>>>> It's really strange! I write a test program according the key >> ordering >> >>>>>>> >> >>>>>>> you provided and parse the corresponding value. It's true! >> >>>>>>> >> >>>>>>> I have no idea now. If free, could you add this debug code to >> >>>>>>> "src/os/GenericObjectMap.cc" and insert *before* "assert(start <= >> >>>>>>> header.oid);": >> >>>>>>> >> >>>>>>> dout(0) << "start: " << start << "header.oid: " << header.oid << >> >>>>>>> dendl; >> >>>>>>> >> >>>>>>> Then you need to recompile ceph-osd and run it again. The output >> log >> >>>>>>> can help it! >> >>>>>>> >> >>>>>>> On Tue, Aug 19, 2014 at 10:19 PM, Haomai Wang < >> haomaiwang at gmail.com> >> >>>>>>> wrote: >> >>>>>>> >> >>>>>>>> >> >>>>>>>> I feel a little embarrassed, 1024 rows still true for me. >> >>>>>>>> >> >>>>>>>> I was wondering if you could give your all keys via >> >>>>>>>> ""ceph-kvstore-tool /var/lib/ceph/osd/ceph-67/current/ list >> >>>>>>>> _GHOBJTOSEQ_ > keys.log?. >> >>>>>>>> >> >>>>>>>> thanks! >> >>>>>>>> >> >>>>>>>> On Tue, Aug 19, 2014 at 4:58 PM, Kenneth Waegeman >> >>>>>>>> <Kenneth.Waegeman at ugent.be> wrote: >> >>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> ----- Message from Haomai Wang <haomaiwang at gmail.com> --------- >> >>>>>>>>> Date: Tue, 19 Aug 2014 12:28:27 +0800 >> >>>>>>>>> >> >>>>>>>>> From: Haomai Wang <haomaiwang at gmail.com> >> >>>>>>>>> Subject: Re: ceph cluster inconsistency? >> >>>>>>>>> To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be> >> >>>>>>>>> Cc: Sage Weil <sweil at redhat.com>, ceph-users at lists.ceph.com >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> On Mon, Aug 18, 2014 at 7:32 PM, Kenneth Waegeman >> >>>>>>>>>> >> >>>>>>>>>> <Kenneth.Waegeman at ugent.be> wrote: >> >>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> ----- Message from Haomai Wang <haomaiwang at gmail.com> >> --------- >> >>>>>>>>>>> Date: Mon, 18 Aug 2014 18:34:11 +0800 >> >>>>>>>>>>> >> >>>>>>>>>>> From: Haomai Wang <haomaiwang at gmail.com> >> >>>>>>>>>>> Subject: Re: ceph cluster inconsistency? >> >>>>>>>>>>> To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be> >> >>>>>>>>>>> Cc: Sage Weil <sweil at redhat.com>, ceph-users at lists.ceph.com >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> On Mon, Aug 18, 2014 at 5:38 PM, Kenneth Waegeman >> >>>>>>>>>>>> >> >>>>>>>>>>>> <Kenneth.Waegeman at ugent.be> wrote: >> >>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Hi, >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> I tried this after restarting the osd, but I guess that was >> not >> >>>>>>>>>>>>> the >> >>>>>>>>>>>>> aim >> >>>>>>>>>>>>> ( >> >>>>>>>>>>>>> # ceph-kvstore-tool /var/lib/ceph/osd/ceph-67/current/ list >> >>>>>>>>>>>>> _GHOBJTOSEQ_| >> >>>>>>>>>>>>> grep 6adb1100 -A 100 >> >>>>>>>>>>>>> IO error: lock /var/lib/ceph/osd/ceph-67/current//LOCK: >> >>>>>>>>>>>>> Resource >> >>>>>>>>>>>>> temporarily >> >>>>>>>>>>>>> unavailable >> >>>>>>>>>>>>> tools/ceph_kvstore_tool.cc: In function >> >>>>>>>>>>>>> 'StoreTool::StoreTool(const >> >>>>>>>>>>>>> string&)' thread 7f8fecf7d780 time 2014-08-18 >> 11:12:29.551780 >> >>>>>>>>>>>>> tools/ceph_kvstore_tool.cc: 38: FAILED >> >>>>>>>>>>>>> assert(!db_ptr->open(std::cerr)) >> >>>>>>>>>>>>> .. >> >>>>>>>>>>>>> ) >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> When I run it after bringing the osd down, it takes a while, >> >>>>>>>>>>>>> but >> >>>>>>>>>>>>> it >> >>>>>>>>>>>>> has >> >>>>>>>>>>>>> no >> >>>>>>>>>>>>> output.. (When running it without the grep, I'm getting a >> huge >> >>>>>>>>>>>>> list >> >>>>>>>>>>>>> ) >> >>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> Oh, sorry for it! I made a mistake, the hash value(6adb1100) >> >>>>>>>>>>>> will >> >>>>>>>>>>>> be >> >>>>>>>>>>>> reversed into leveldb. >> >>>>>>>>>>>> So grep "benchmark_data_ceph001.cubone.os_5560_object789734" >> >>>>>>>>>>>> should >> >>>>>>>>>>>> be >> >>>>>>>>>>>> help it. >> >>>>>>>>>>>> >> >>>>>>>>>>>> this gives: >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> [root at ceph003 ~]# ceph-kvstore-tool >> /var/lib/ceph/osd/ceph-67/ >> >>>>>>>>>>> current/ >> >>>>>>>>>>> list >> >>>>>>>>>>> _GHOBJTOSEQ_ | grep 5560_object789734 -A 100 >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011BDA6!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object789734!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011C027!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object1330170!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011C6FD!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_4919_object227366!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011CB03!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1363631!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011CDF0!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1573957!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011D02C!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1019282!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011E2B5!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object1283563!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011E511!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_4919_object273736!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011E547!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1170628!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011EAAB!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_4919_object256335!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011F446!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1484196!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011FC59!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object884178!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001203F3!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object853746!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001208E3!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object36633!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00120B37!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object1235337!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001210B6!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1661351!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001210CB!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object238126!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012184C!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object339943!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00121916!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1047094!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001219C1!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object520642!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001222BB!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object639565!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001223AA!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_4919_object231080!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012243C!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object858050!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012289C!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object241796!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00122D28!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_4919_object7462!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00122DFE!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object243798!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00122EFC!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_8961_object109512!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001232D7!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object653973!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001234A3!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1378169!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00123714!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object512925!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001237D9!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_4919_object23289!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00123854!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object1108852!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00123971!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object704026!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00123F75!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_8961_object250441!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00124083!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object706178!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001240FA!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object316952!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012447D!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object538734!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001244D9!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object789215!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001247CD!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_8961_object265993!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00124897!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object610597!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00124BE4!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object691723!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00124C9B!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1306135!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00124E1D!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object520580!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012534C!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object659767!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00125A81!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object184060!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00125E77!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1292867!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00126562!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object1201410!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00126B34!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1657326!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00127383!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1269787!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00127396!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object500115!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001277F8!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object394932!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001279DD!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_4919_object252963!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00127B40!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object936811!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00127BAC!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object1481773!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012894E!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object999885!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00128D05!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object943667!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012908A!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object212990!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00129519!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object437596!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00129716!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1585330!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00129798!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object603505!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001299C9!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object808800!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00129B7A!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object23193!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00129B9A!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object1158397!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012A932!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object542450!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012B77A!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_8961_object195480!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012BE8C!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_4919_object312911!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012BF74!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1563783!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012C65C!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1123980!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012C6FE!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_3411_object913!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012CCAD!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object400863!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012CDBB!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object789667!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012D14B!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object1020723!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012D95B!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_8961_object106293!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012E3C8!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1355526!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012E5B3!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1491348!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012F2BB!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_8961_object338872!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012F374!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object1337264!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012FBE5!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1512395!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012FCE3!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_8961_object298610!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012FEB6!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_4919_object120824!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001301CA!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object816326!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00130263!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object777163!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00130529!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1413173!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001317D9!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object809510!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0013204F!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object471416!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00132400!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object695087!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00132A19!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object591945!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00132BF8!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object302000!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00132F5B!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1645443!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00133B8B!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object761911!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0013433E!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object1467727!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00134446!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object791960!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00134678!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object677078!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00134A96!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object254923!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001355D0!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_31461_object321528!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00135690!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_4919_object36935!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00135B62!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1228272!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00135C72!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_4812_object2180!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00135DEE!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object425705!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00136366!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object141569!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00136371!!3!!benchmark_data_ >> >>>>>>>>>>> ceph001%ecubone%eos_5560_object564213!head >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>> 100 rows seemed true for me. I found the min list objects is >> 1024. >> >>>>>>>>>> Please could you run >> >>>>>>>>>> "ceph-kvstore-tool /var/lib/ceph/osd/ceph-67/current/ list >> >>>>>>>>>> _GHOBJTOSEQ_| grep 6adb1100 -A 1024" >> >>>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I got the output in attachment >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>>>>> Or should I run this immediately after the osd is crashed, >> >>>>>>>>>>>>> (because >> >>>>>>>>>>>>> it >> >>>>>>>>>>>>> maybe >> >>>>>>>>>>>>> rebalanced? I did already restarted the cluster) >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> I don't know if it is related, but before I could all do >> that, >> >>>>>>>>>>>>> I >> >>>>>>>>>>>>> had >> >>>>>>>>>>>>> to >> >>>>>>>>>>>>> fix >> >>>>>>>>>>>>> something else: A monitor did run out if disk space, using >> 8GB >> >>>>>>>>>>>>> for >> >>>>>>>>>>>>> his >> >>>>>>>>>>>>> store.db folder (lot of sst files). Other monitors are also >> >>>>>>>>>>>>> near >> >>>>>>>>>>>>> that >> >>>>>>>>>>>>> level. >> >>>>>>>>>>>>> Never had that problem on previous setups before. I >> recreated a >> >>>>>>>>>>>>> monitor >> >>>>>>>>>>>>> and >> >>>>>>>>>>>>> now it uses 3.8GB. >> >>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> It exists some duplicate data which needed to be compacted. >> >>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>> Another idea, maybe you can make KeyValueStore's stripe size >> >>>>>>>>>>>> align >> >>>>>>>>>>>> with EC stripe size. >> >>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> How can I do that? Is there some documentation about that? >> >>>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> ceph --show-config | grep keyvaluestore >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> debug_keyvaluestore = 0/0 >> >>>>>>>>>> keyvaluestore_queue_max_ops = 50 >> >>>>>>>>>> keyvaluestore_queue_max_bytes = 104857600 >> >>>>>>>>>> keyvaluestore_debug_check_backend = false >> >>>>>>>>>> keyvaluestore_op_threads = 2 >> >>>>>>>>>> keyvaluestore_op_thread_timeout = 60 >> >>>>>>>>>> keyvaluestore_op_thread_suicide_timeout = 180 >> >>>>>>>>>> keyvaluestore_default_strip_size = 4096 >> >>>>>>>>>> keyvaluestore_max_expected_write_size = 16777216 >> >>>>>>>>>> keyvaluestore_header_cache_size = 4096 >> >>>>>>>>>> keyvaluestore_backend = leveldb >> >>>>>>>>>> >> >>>>>>>>>> keyvaluestore_default_strip_size is the wanted >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> I haven't think deeply and maybe I will try it later. >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> Thanks! >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Kenneth >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> ----- Message from Sage Weil <sweil at redhat.com> --------- >> >>>>>>>>>>>>> Date: Fri, 15 Aug 2014 06:10:34 -0700 (PDT) >> >>>>>>>>>>>>> From: Sage Weil <sweil at redhat.com> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Subject: Re: ceph cluster inconsistency? >> >>>>>>>>>>>>> To: Haomai Wang <haomaiwang at gmail.com> >> >>>>>>>>>>>>> Cc: Kenneth Waegeman <Kenneth.Waegeman at ugent.be>, >> >>>>>>>>>>>>> ceph-users at lists.ceph.com >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> On Fri, 15 Aug 2014, Haomai Wang wrote: >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> Hi Kenneth, >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> I don't find valuable info in your logs, it lack of the >> >>>>>>>>>>>>>>> necessary >> >>>>>>>>>>>>>>> debug output when accessing crash code. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> But I scan the encode/decode implementation in >> >>>>>>>>>>>>>>> GenericObjectMap >> >>>>>>>>>>>>>>> and >> >>>>>>>>>>>>>>> find something bad. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> For example, two oid has same hash and their name is: >> >>>>>>>>>>>>>>> A: "rb.data.123" >> >>>>>>>>>>>>>>> B: "rb-123" >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> In ghobject_t compare level, A < B. But GenericObjectMap >> >>>>>>>>>>>>>>> encode >> >>>>>>>>>>>>>>> "." >> >>>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>> "%e", so the key in DB is: >> >>>>>>>>>>>>>>> A: _GHOBJTOSEQ_:blah!51615000!!none!!rb%edata%e123!head >> >>>>>>>>>>>>>>> B: _GHOBJTOSEQ_:blah!51615000!!none!!rb-123!head >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> A > B >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> And it seemed that the escape function is useless and >> should >> >>>>>>>>>>>>>>> be >> >>>>>>>>>>>>>>> disabled. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> I'm not sure whether Kenneth's problem is touching this >> bug. >> >>>>>>>>>>>>>>> Because >> >>>>>>>>>>>>>>> this scene only occur when the object set is very large >> and >> >>>>>>>>>>>>>>> make >> >>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>> two object has same hash value. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> Kenneth, could you have time to run "ceph-kv-store >> >>>>>>>>>>>>>>> [path-to-osd] >> >>>>>>>>>>>>>>> list >> >>>>>>>>>>>>>>> _GHOBJTOSEQ_| grep 6adb1100 -A 100". ceph-kv-store is a >> debug >> >>>>>>>>>>>>>>> tool >> >>>>>>>>>>>>>>> which can be compiled from source. You can clone ceph repo >> >>>>>>>>>>>>>>> and >> >>>>>>>>>>>>>>> run >> >>>>>>>>>>>>>>> "./authongen.sh; ./configure; cd src; make >> >>>>>>>>>>>>>>> ceph-kvstore-tool". >> >>>>>>>>>>>>>>> "path-to-osd" should be "/var/lib/ceph/osd-[id]/current/". >> >>>>>>>>>>>>>>> "6adb1100" >> >>>>>>>>>>>>>>> is from your verbose log and the next 100 rows should know >> >>>>>>>>>>>>>>> necessary >> >>>>>>>>>>>>>>> infos. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> You can also get ceph-kvstore-tool from the 'ceph-tests' >> >>>>>>>>>>>>>> package. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Hi sage, do you think we need to provided with upgrade >> >>>>>>>>>>>>>> function >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>> fix >> >>>>>>>>>>>>>>> it? >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Hmm, we might. This only affects the key/value encoding >> >>>>>>>>>>>>>> right? >> >>>>>>>>>>>>>> The >> >>>>>>>>>>>>>> FileStore is using its own function to map these to file >> >>>>>>>>>>>>>> names? >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Can you open a ticket in the tracker for this? >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Thanks! >> >>>>>>>>>>>>>> sage >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> On Thu, Aug 14, 2014 at 7:36 PM, Kenneth Waegeman >> >>>>>>>>>>>>>>> <Kenneth.Waegeman at ugent.be> wrote: >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> ----- Message from Haomai Wang <haomaiwang at gmail.com> >> >>>>>>>>>>>>>>>> --------- >> >>>>>>>>>>>>>>>> Date: Thu, 14 Aug 2014 19:11:55 +0800 >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> From: Haomai Wang <haomaiwang at gmail.com> >> >>>>>>>>>>>>>>>> Subject: Re: ceph cluster inconsistency? >> >>>>>>>>>>>>>>>> To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> Could you add config "debug_keyvaluestore = 20/20" to the >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> crashed >> >>>>>>>>>>>>>>>>> osd >> >>>>>>>>>>>>>>>>> and replay the command causing crash? >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> I would like to get more debug infos! Thanks. >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> I included the log in attachment! >> >>>>>>>>>>>>>>>> Thanks! >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> On Thu, Aug 14, 2014 at 4:41 PM, Kenneth Waegeman >> >>>>>>>>>>>>>>>>> <Kenneth.Waegeman at ugent.be> wrote: >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> I have: >> >>>>>>>>>>>>>>>>>> osd_objectstore = keyvaluestore-dev >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> in the global section of my ceph.conf >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> [root at ceph002 ~]# ceph osd erasure-code-profile get >> >>>>>>>>>>>>>>>>>> profile11 >> >>>>>>>>>>>>>>>>>> directory=/usr/lib64/ceph/erasure-code >> >>>>>>>>>>>>>>>>>> k=8 >> >>>>>>>>>>>>>>>>>> m=3 >> >>>>>>>>>>>>>>>>>> plugin=jerasure >> >>>>>>>>>>>>>>>>>> ruleset-failure-domain=osd >> >>>>>>>>>>>>>>>>>> technique=reed_sol_van >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> the ecdata pool has this as profile >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> pool 3 'ecdata' erasure size 11 min_size 8 >> crush_ruleset 2 >> >>>>>>>>>>>>>>>>>> object_hash >> >>>>>>>>>>>>>>>>>> rjenkins pg_num 128 pgp_num 128 last_change 161 flags >> >>>>>>>>>>>>>>>>>> hashpspool >> >>>>>>>>>>>>>>>>>> stripe_width 4096 >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> ECrule in crushmap >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> rule ecdata { >> >>>>>>>>>>>>>>>>>> ruleset 2 >> >>>>>>>>>>>>>>>>>> type erasure >> >>>>>>>>>>>>>>>>>> min_size 3 >> >>>>>>>>>>>>>>>>>> max_size 20 >> >>>>>>>>>>>>>>>>>> step set_chooseleaf_tries 5 >> >>>>>>>>>>>>>>>>>> step take default-ec >> >>>>>>>>>>>>>>>>>> step choose indep 0 type osd >> >>>>>>>>>>>>>>>>>> step emit >> >>>>>>>>>>>>>>>>>> } >> >>>>>>>>>>>>>>>>>> root default-ec { >> >>>>>>>>>>>>>>>>>> id -8 # do not change unnecessarily >> >>>>>>>>>>>>>>>>>> # weight 140.616 >> >>>>>>>>>>>>>>>>>> alg straw >> >>>>>>>>>>>>>>>>>> hash 0 # rjenkins1 >> >>>>>>>>>>>>>>>>>> item ceph001-ec weight 46.872 >> >>>>>>>>>>>>>>>>>> item ceph002-ec weight 46.872 >> >>>>>>>>>>>>>>>>>> item ceph003-ec weight 46.872 >> >>>>>>>>>>>>>>>>>> ... >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> Cheers! >> >>>>>>>>>>>>>>>>>> Kenneth >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> ----- Message from Haomai Wang <haomaiwang at gmail.com> >> >>>>>>>>>>>>>>>>>> --------- >> >>>>>>>>>>>>>>>>>> Date: Thu, 14 Aug 2014 10:07:50 +0800 >> >>>>>>>>>>>>>>>>>> From: Haomai Wang <haomaiwang at gmail.com> >> >>>>>>>>>>>>>>>>>> Subject: Re: ceph cluster inconsistency? >> >>>>>>>>>>>>>>>>>> To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be> >> >>>>>>>>>>>>>>>>>> Cc: ceph-users <ceph-users at lists.ceph.com> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> Hi Kenneth, >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> Could you give your configuration related to EC and >> >>>>>>>>>>>>>>>>>>> KeyValueStore? >> >>>>>>>>>>>>>>>>>>> Not sure whether it's bug on KeyValueStore >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> On Thu, Aug 14, 2014 at 12:06 AM, Kenneth Waegeman >> >>>>>>>>>>>>>>>>>>> <Kenneth.Waegeman at ugent.be> wrote: >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> Hi, >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> I was doing some tests with rados bench on a Erasure >> >>>>>>>>>>>>>>>>>>>> Coded >> >>>>>>>>>>>>>>>>>>>> pool >> >>>>>>>>>>>>>>>>>>>> (using >> >>>>>>>>>>>>>>>>>>>> keyvaluestore-dev objectstore) on 0.83, and I see >> some >> >>>>>>>>>>>>>>>>>>>> strangs >> >>>>>>>>>>>>>>>>>>>> things: >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> [root at ceph001 ~]# ceph status >> >>>>>>>>>>>>>>>>>>>> cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d >> >>>>>>>>>>>>>>>>>>>> health HEALTH_WARN too few pgs per osd (4 < min >> 20) >> >>>>>>>>>>>>>>>>>>>> monmap e1: 3 mons at >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> {ceph001= >> 10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0, >> >>>>>>>>>>>>>>>>>>>> ceph003=10.141.8.182:6789/0}, >> >>>>>>>>>>>>>>>>>>>> election epoch 6, quorum 0,1,2 >> ceph001,ceph002,ceph003 >> >>>>>>>>>>>>>>>>>>>> mdsmap e116: 1/1/1 up >> >>>>>>>>>>>>>>>>>>>> {0=ceph001.cubone.os=up:active}, >> >>>>>>>>>>>>>>>>>>>> 2 >> >>>>>>>>>>>>>>>>>>>> up:standby >> >>>>>>>>>>>>>>>>>>>> osdmap e292: 78 osds: 78 up, 78 in >> >>>>>>>>>>>>>>>>>>>> pgmap v48873: 320 pgs, 4 pools, 15366 GB data, >> 3841 >> >>>>>>>>>>>>>>>>>>>> kobjects >> >>>>>>>>>>>>>>>>>>>> 1381 GB used, 129 TB / 131 TB avail >> >>>>>>>>>>>>>>>>>>>> 320 active+clean >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> There is around 15T of data, but only 1.3 T usage. >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> This is also visible in rados: >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> [root at ceph001 ~]# rados df >> >>>>>>>>>>>>>>>>>>>> pool name category KB >> objects >> >>>>>>>>>>>>>>>>>>>> clones >> >>>>>>>>>>>>>>>>>>>> degraded unfound rd rd KB >> >>>>>>>>>>>>>>>>>>>> wr >> >>>>>>>>>>>>>>>>>>>> wr >> >>>>>>>>>>>>>>>>>>>> KB >> >>>>>>>>>>>>>>>>>>>> data - 0 >> >>>>>>>>>>>>>>>>>>>> 0 >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> 0 0 0 0 0 >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> ecdata - 16113451009 >> >>>>>>>>>>>>>>>>>>>> 3933959 >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> 0 0 1 1 3935632 >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> 16116850711 >> >>>>>>>>>>>>>>>>>>>> metadata - 2 >> >>>>>>>>>>>>>>>>>>>> 20 >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> 0 0 33 36 21 >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> 8 >> >>>>>>>>>>>>>>>>>>>> rbd - 0 >> >>>>>>>>>>>>>>>>>>>> 0 >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> 0 0 0 0 0 >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> total used 1448266016 3933979 >> >>>>>>>>>>>>>>>>>>>> total avail 139400181016 >> >>>>>>>>>>>>>>>>>>>> total space 140848447032 >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> Another (related?) thing: if I do rados -p ecdata >> ls, I >> >>>>>>>>>>>>>>>>>>>> trigger >> >>>>>>>>>>>>>>>>>>>> osd >> >>>>>>>>>>>>>>>>>>>> shutdowns (each time): >> >>>>>>>>>>>>>>>>>>>> I get a list followed by an error: >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> ... >> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_8961_object243839 >> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object801983 >> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_31461_object856489 >> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_8961_object202232 >> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_4919_object33199 >> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object807797 >> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_4919_object74729 >> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_31461_object1264121 >> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object1318513 >> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object1202111 >> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_31461_object939107 >> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_31461_object729682 >> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object122915 >> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object76521 >> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object113261 >> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_31461_object575079 >> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object671042 >> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object381146 >> >>>>>>>>>>>>>>>>>>>> 2014-08-13 17:57:48.736150 7f65047b5700 0 -- >> >>>>>>>>>>>>>>>>>>>> 10.141.8.180:0/1023295 >> >> >>>>>>>>>>>>>>>>>>>> 10.141.8.182:6839/4471 pipe(0x7f64fc019b20 sd=5 :0 >> s=1 >> >>>>>>>>>>>>>>>>>>>> pgs=0 >> >>>>>>>>>>>>>>>>>>>> cs=0 >> >>>>>>>>>>>>>>>>>>>> l=1 >> >>>>>>>>>>>>>>>>>>>> c=0x7f64fc019db0).fault >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> And I can see this in the log files: >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> -25> 2014-08-13 17:52:56.323908 7f8a97fa4700 1 -- >> >>>>>>>>>>>>>>>>>>>> 10.143.8.182:6827/64670 <== osd.57 >> 10.141.8.182:0/15796 >> >>>>>>>>>>>>>>>>>>>> 51 >> >>>>>>>>>>>>>>>>>>>> ==== >> >>>>>>>>>>>>>>>>>>>> osd_ping(ping e220 stamp 2014-08-13 17:52:56.323092) >> v2 >> >>>>>>>>>>>>>>>>>>>> ==== >> >>>>>>>>>>>>>>>>>>>> 47+0+0 >> >>>>>>>>>>>>>>>>>>>> (3227325175 0 0) 0xf475940 con 0xee89fa0 >> >>>>>>>>>>>>>>>>>>>> -24> 2014-08-13 17:52:56.323938 7f8a97fa4700 1 -- >> >>>>>>>>>>>>>>>>>>>> 10.143.8.182:6827/64670 --> 10.141.8.182:0/15796 -- >> >>>>>>>>>>>>>>>>>>>> osd_ping(ping_reply >> >>>>>>>>>>>>>>>>>>>> e220 >> >>>>>>>>>>>>>>>>>>>> stamp 2014-08-13 17:52:56.323092) v2 -- ?+0 0xf815b00 >> >>>>>>>>>>>>>>>>>>>> con >> >>>>>>>>>>>>>>>>>>>> 0xee89fa0 >> >>>>>>>>>>>>>>>>>>>> -23> 2014-08-13 17:52:56.324078 7f8a997a7700 1 -- >> >>>>>>>>>>>>>>>>>>>> 10.141.8.182:6840/64670 <== osd.57 >> 10.141.8.182:0/15796 >> >>>>>>>>>>>>>>>>>>>> 51 >> >>>>>>>>>>>>>>>>>>>> ==== >> >>>>>>>>>>>>>>>>>>>> osd_ping(ping e220 stamp 2014-08-13 17:52:56.323092) >> v2 >> >>>>>>>>>>>>>>>>>>>> ==== >> >>>>>>>>>>>>>>>>>>>> 47+0+0 >> >>>>>>>>>>>>>>>>>>>> (3227325175 0 0) 0xf132bc0 con 0xee8a680 >> >>>>>>>>>>>>>>>>>>>> -22> 2014-08-13 17:52:56.324111 7f8a997a7700 1 -- >> >>>>>>>>>>>>>>>>>>>> 10.141.8.182:6840/64670 --> 10.141.8.182:0/15796 -- >> >>>>>>>>>>>>>>>>>>>> osd_ping(ping_reply >> >>>>>>>>>>>>>>>>>>>> e220 >> >>>>>>>>>>>>>>>>>>>> stamp 2014-08-13 17:52:56.323092) v2 -- ?+0 0xf811a40 >> >>>>>>>>>>>>>>>>>>>> con >> >>>>>>>>>>>>>>>>>>>> 0xee8a680 >> >>>>>>>>>>>>>>>>>>>> -21> 2014-08-13 17:52:56.584461 7f8a997a7700 1 -- >> >>>>>>>>>>>>>>>>>>>> 10.141.8.182:6840/64670 <== osd.29 >> 10.143.8.181:0/12142 >> >>>>>>>>>>>>>>>>>>>> 47 >> >>>>>>>>>>>>>>>>>>>> ==== >> >>>>>>>>>>>>>>>>>>>> osd_ping(ping e220 stamp 2014-08-13 17:52:56.583010) >> v2 >> >>>>>>>>>>>>>>>>>>>> ==== >> >>>>>>>>>>>>>>>>>>>> 47+0+0 >> >>>>>>>>>>>>>>>>>>>> (3355887204 0 0) 0xf655940 con 0xee88b00 >> >>>>>>>>>>>>>>>>>>>> -20> 2014-08-13 17:52:56.584486 7f8a997a7700 1 -- >> >>>>>>>>>>>>>>>>>>>> 10.141.8.182:6840/64670 --> 10.143.8.181:0/12142 -- >> >>>>>>>>>>>>>>>>>>>> osd_ping(ping_reply >> >>>>>>>>>>>>>>>>>>>> e220 >> >>>>>>>>>>>>>>>>>>>> stamp 2014-08-13 17:52:56.583010) v2 -- ?+0 0xf132bc0 >> >>>>>>>>>>>>>>>>>>>> con >> >>>>>>>>>>>>>>>>>>>> 0xee88b00 >> >>>>>>>>>>>>>>>>>>>> -19> 2014-08-13 17:52:56.584498 7f8a97fa4700 1 -- >> >>>>>>>>>>>>>>>>>>>> 10.143.8.182:6827/64670 <== osd.29 >> 10.143.8.181:0/12142 >> >>>>>>>>>>>>>>>>>>>> 47 >> >>>>>>>>>>>>>>>>>>>> ==== >> >>>>>>>>>>>>>>>>>>>> osd_ping(ping e220 stamp 2014-08-13 17:52:56.583010) >> v2 >> >>>>>>>>>>>>>>>>>>>> ==== >> >>>>>>>>>>>>>>>>>>>> 47+0+0 >> >>>>>>>>>>>>>>>>>>>> (3355887204 0 0) 0xf20e040 con 0xee886e0 >> >>>>>>>>>>>>>>>>>>>> -18> 2014-08-13 17:52:56.584526 7f8a97fa4700 1 -- >> >>>>>>>>>>>>>>>>>>>> 10.143.8.182:6827/64670 --> 10.143.8.181:0/12142 -- >> >>>>>>>>>>>>>>>>>>>> osd_ping(ping_reply >> >>>>>>>>>>>>>>>>>>>> e220 >> >>>>>>>>>>>>>>>>>>>> stamp 2014-08-13 17:52:56.583010) v2 -- ?+0 0xf475940 >> >>>>>>>>>>>>>>>>>>>> con >> >>>>>>>>>>>>>>>>>>>> 0xee886e0 >> >>>>>>>>>>>>>>>>>>>> -17> 2014-08-13 17:52:56.594448 7f8a798c7700 1 -- >> >>>>>>>>>>>>>>>>>>>> 10.141.8.182:6839/64670 >> :/0 pipe(0xec15f00 sd=74 >> >>>>>>>>>>>>>>>>>>>> :6839 >> >>>>>>>>>>>>>>>>>>>> s=0 >> >>>>>>>>>>>>>>>>>>>> pgs=0 >> >>>>>>>>>>>>>>>>>>>> cs=0 >> >>>>>>>>>>>>>>>>>>>> l=0 >> >>>>>>>>>>>>>>>>>>>> c=0xee856a0).accept sd=74 10.141.8.180:47641/0 >> >>>>>>>>>>>>>>>>>>>> -16> 2014-08-13 17:52:56.594921 7f8a798c7700 1 -- >> >>>>>>>>>>>>>>>>>>>> 10.141.8.182:6839/64670 <== client.7512 >> >>>>>>>>>>>>>>>>>>>> 10.141.8.180:0/1018433 >> >>>>>>>>>>>>>>>>>>>> 1 >> >>>>>>>>>>>>>>>>>>>> ==== >> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >> >>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) v4 ==== 151+0+39 >> >>>>>>>>>>>>>>>>>>>> (1972163119 >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> 4174233976) 0xf3bca40 con 0xee856a0 >> >>>>>>>>>>>>>>>>>>>> -15> 2014-08-13 17:52:56.594957 7f8a798c7700 5 -- >> op >> >>>>>>>>>>>>>>>>>>>> tracker >> >>>>>>>>>>>>>>>>>>>> -- >> >>>>>>>>>>>>>>>>>>>> , >> >>>>>>>>>>>>>>>>>>>> seq: >> >>>>>>>>>>>>>>>>>>>> 299, time: 2014-08-13 17:52:56.594874, event: >> >>>>>>>>>>>>>>>>>>>> header_read, >> >>>>>>>>>>>>>>>>>>>> op: >> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >> >>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >> >>>>>>>>>>>>>>>>>>>> -14> 2014-08-13 17:52:56.594970 7f8a798c7700 5 -- >> op >> >>>>>>>>>>>>>>>>>>>> tracker >> >>>>>>>>>>>>>>>>>>>> -- >> >>>>>>>>>>>>>>>>>>>> , >> >>>>>>>>>>>>>>>>>>>> seq: >> >>>>>>>>>>>>>>>>>>>> 299, time: 2014-08-13 17:52:56.594880, event: >> throttled, >> >>>>>>>>>>>>>>>>>>>> op: >> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >> >>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >> >>>>>>>>>>>>>>>>>>>> -13> 2014-08-13 17:52:56.594978 7f8a798c7700 5 -- >> op >> >>>>>>>>>>>>>>>>>>>> tracker >> >>>>>>>>>>>>>>>>>>>> -- >> >>>>>>>>>>>>>>>>>>>> , >> >>>>>>>>>>>>>>>>>>>> seq: >> >>>>>>>>>>>>>>>>>>>> 299, time: 2014-08-13 17:52:56.594917, event: >> all_read, >> >>>>>>>>>>>>>>>>>>>> op: >> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >> >>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >> >>>>>>>>>>>>>>>>>>>> -12> 2014-08-13 17:52:56.594986 7f8a798c7700 5 -- >> op >> >>>>>>>>>>>>>>>>>>>> tracker >> >>>>>>>>>>>>>>>>>>>> -- >> >>>>>>>>>>>>>>>>>>>> , >> >>>>>>>>>>>>>>>>>>>> seq: >> >>>>>>>>>>>>>>>>>>>> 299, time: 0.000000, event: dispatched, op: >> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:1 >> >>>>>>>>>>>>>>>>>>>> [pgls >> >>>>>>>>>>>>>>>>>>>> start_epoch 0] 3.0 ack+read+known_if_redirected e220) >> >>>>>>>>>>>>>>>>>>>> -11> 2014-08-13 17:52:56.595127 7f8a90795700 5 -- >> op >> >>>>>>>>>>>>>>>>>>>> tracker >> >>>>>>>>>>>>>>>>>>>> -- >> >>>>>>>>>>>>>>>>>>>> , >> >>>>>>>>>>>>>>>>>>>> seq: >> >>>>>>>>>>>>>>>>>>>> 299, time: 2014-08-13 17:52:56.595104, event: >> >>>>>>>>>>>>>>>>>>>> reached_pg, >> >>>>>>>>>>>>>>>>>>>> op: >> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >> >>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >> >>>>>>>>>>>>>>>>>>>> -10> 2014-08-13 17:52:56.595159 7f8a90795700 5 -- >> op >> >>>>>>>>>>>>>>>>>>>> tracker >> >>>>>>>>>>>>>>>>>>>> -- >> >>>>>>>>>>>>>>>>>>>> , >> >>>>>>>>>>>>>>>>>>>> seq: >> >>>>>>>>>>>>>>>>>>>> 299, time: 2014-08-13 17:52:56.595153, event: >> started, >> >>>>>>>>>>>>>>>>>>>> op: >> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >> >>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >> >>>>>>>>>>>>>>>>>>>> -9> 2014-08-13 17:52:56.602179 7f8a90795700 1 -- >> >>>>>>>>>>>>>>>>>>>> 10.141.8.182:6839/64670 --> 10.141.8.180:0/1018433 >> -- >> >>>>>>>>>>>>>>>>>>>> osd_op_reply(1 >> >>>>>>>>>>>>>>>>>>>> [pgls >> >>>>>>>>>>>>>>>>>>>> start_epoch 0] v164'30654 uv30654 ondisk = 0) v6 -- >> ?+0 >> >>>>>>>>>>>>>>>>>>>> 0xec16180 >> >>>>>>>>>>>>>>>>>>>> con >> >>>>>>>>>>>>>>>>>>>> 0xee856a0 >> >>>>>>>>>>>>>>>>>>>> -8> 2014-08-13 17:52:56.602211 7f8a90795700 5 -- >> op >> >>>>>>>>>>>>>>>>>>>> tracker >> >>>>>>>>>>>>>>>>>>>> -- >> >>>>>>>>>>>>>>>>>>>> , >> >>>>>>>>>>>>>>>>>>>> seq: >> >>>>>>>>>>>>>>>>>>>> 299, time: 2014-08-13 17:52:56.602205, event: done, >> op: >> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:1 [pgls start_epoch 0] 3.0 >> >>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >> >>>>>>>>>>>>>>>>>>>> -7> 2014-08-13 17:52:56.614839 7f8a798c7700 1 -- >> >>>>>>>>>>>>>>>>>>>> 10.141.8.182:6839/64670 <== client.7512 >> >>>>>>>>>>>>>>>>>>>> 10.141.8.180:0/1018433 >> >>>>>>>>>>>>>>>>>>>> 2 >> >>>>>>>>>>>>>>>>>>>> ==== >> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >> >>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) v4 ==== 151+0+89 >> >>>>>>>>>>>>>>>>>>>> (3460833343 >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> 2600845095) 0xf3bcec0 con 0xee856a0 >> >>>>>>>>>>>>>>>>>>>> -6> 2014-08-13 17:52:56.614864 7f8a798c7700 5 -- >> op >> >>>>>>>>>>>>>>>>>>>> tracker >> >>>>>>>>>>>>>>>>>>>> -- >> >>>>>>>>>>>>>>>>>>>> , >> >>>>>>>>>>>>>>>>>>>> seq: >> >>>>>>>>>>>>>>>>>>>> 300, time: 2014-08-13 17:52:56.614789, event: >> >>>>>>>>>>>>>>>>>>>> header_read, >> >>>>>>>>>>>>>>>>>>>> op: >> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >> >>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >> >>>>>>>>>>>>>>>>>>>> -5> 2014-08-13 17:52:56.614874 7f8a798c7700 5 -- >> op >> >>>>>>>>>>>>>>>>>>>> tracker >> >>>>>>>>>>>>>>>>>>>> -- >> >>>>>>>>>>>>>>>>>>>> , >> >>>>>>>>>>>>>>>>>>>> seq: >> >>>>>>>>>>>>>>>>>>>> 300, time: 2014-08-13 17:52:56.614792, event: >> throttled, >> >>>>>>>>>>>>>>>>>>>> op: >> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >> >>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >> >>>>>>>>>>>>>>>>>>>> -4> 2014-08-13 17:52:56.614884 7f8a798c7700 5 -- >> op >> >>>>>>>>>>>>>>>>>>>> tracker >> >>>>>>>>>>>>>>>>>>>> -- >> >>>>>>>>>>>>>>>>>>>> , >> >>>>>>>>>>>>>>>>>>>> seq: >> >>>>>>>>>>>>>>>>>>>> 300, time: 2014-08-13 17:52:56.614835, event: >> all_read, >> >>>>>>>>>>>>>>>>>>>> op: >> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >> >>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >> >>>>>>>>>>>>>>>>>>>> -3> 2014-08-13 17:52:56.614891 7f8a798c7700 5 -- >> op >> >>>>>>>>>>>>>>>>>>>> tracker >> >>>>>>>>>>>>>>>>>>>> -- >> >>>>>>>>>>>>>>>>>>>> , >> >>>>>>>>>>>>>>>>>>>> seq: >> >>>>>>>>>>>>>>>>>>>> 300, time: 0.000000, event: dispatched, op: >> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:2 >> >>>>>>>>>>>>>>>>>>>> [pgls >> >>>>>>>>>>>>>>>>>>>> start_epoch 220] 3.0 ack+read+known_if_redirected >> e220) >> >>>>>>>>>>>>>>>>>>>> -2> 2014-08-13 17:52:56.614972 7f8a92f9a700 5 -- >> op >> >>>>>>>>>>>>>>>>>>>> tracker >> >>>>>>>>>>>>>>>>>>>> -- >> >>>>>>>>>>>>>>>>>>>> , >> >>>>>>>>>>>>>>>>>>>> seq: >> >>>>>>>>>>>>>>>>>>>> 300, time: 2014-08-13 17:52:56.614958, event: >> >>>>>>>>>>>>>>>>>>>> reached_pg, >> >>>>>>>>>>>>>>>>>>>> op: >> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >> >>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >> >>>>>>>>>>>>>>>>>>>> -1> 2014-08-13 17:52:56.614993 7f8a92f9a700 5 -- >> op >> >>>>>>>>>>>>>>>>>>>> tracker >> >>>>>>>>>>>>>>>>>>>> -- >> >>>>>>>>>>>>>>>>>>>> , >> >>>>>>>>>>>>>>>>>>>> seq: >> >>>>>>>>>>>>>>>>>>>> 300, time: 2014-08-13 17:52:56.614986, event: >> started, >> >>>>>>>>>>>>>>>>>>>> op: >> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:2 [pgls start_epoch 220] 3.0 >> >>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) >> >>>>>>>>>>>>>>>>>>>> 0> 2014-08-13 17:52:56.617087 7f8a92f9a700 -1 >> >>>>>>>>>>>>>>>>>>>> os/GenericObjectMap.cc: >> >>>>>>>>>>>>>>>>>>>> In function 'int GenericObjectMap::list_objects(const >> >>>>>>>>>>>>>>>>>>>> coll_t&, >> >>>>>>>>>>>>>>>>>>>> ghobject_t, >> >>>>>>>>>>>>>>>>>>>> int, std::vector<ghobject_t>*, ghobject_t*)' thread >> >>>>>>>>>>>>>>>>>>>> 7f8a92f9a700 >> >>>>>>>>>>>>>>>>>>>> time >> >>>>>>>>>>>>>>>>>>>> 2014-08-13 17:52:56.615073 >> >>>>>>>>>>>>>>>>>>>> os/GenericObjectMap.cc: 1118: FAILED assert(start <= >> >>>>>>>>>>>>>>>>>>>> header.oid) >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> ceph version 0.83 (78ff1f0a5dfd3c5850805b40217385 >> >>>>>>>>>>>>>>>>>>>> 64c36c92b8) >> >>>>>>>>>>>>>>>>>>>> 1: (GenericObjectMap::list_objects(coll_t const&, >> >>>>>>>>>>>>>>>>>>>> ghobject_t, >> >>>>>>>>>>>>>>>>>>>> int, >> >>>>>>>>>>>>>>>>>>>> std::vector<ghobject_t, std::allocator<ghobject_t> >> >*, >> >>>>>>>>>>>>>>>>>>>> ghobject_t*)+0x474) >> >>>>>>>>>>>>>>>>>>>> [0x98f774] >> >>>>>>>>>>>>>>>>>>>> 2: (KeyValueStore::collection_list_partial(coll_t, >> >>>>>>>>>>>>>>>>>>>> ghobject_t, >> >>>>>>>>>>>>>>>>>>>> int, >> >>>>>>>>>>>>>>>>>>>> int, >> >>>>>>>>>>>>>>>>>>>> snapid_t, std::vector<ghobject_t, >> >>>>>>>>>>>>>>>>>>>> std::allocator<ghobject_t> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> *, >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> ghobject_t*)+0x274) [0x8c5b54] >> >>>>>>>>>>>>>>>>>>>> 3: (PGBackend::objects_list_partial(hobject_t const&, >> >>>>>>>>>>>>>>>>>>>> int, >> >>>>>>>>>>>>>>>>>>>> int, >> >>>>>>>>>>>>>>>>>>>> snapid_t, >> >>>>>>>>>>>>>>>>>>>> std::vector<hobject_t, std::allocator<hobject_t> >*, >> >>>>>>>>>>>>>>>>>>>> hobject_t*)+0x1c9) >> >>>>>>>>>>>>>>>>>>>> [0x862de9] >> >>>>>>>>>>>>>>>>>>>> 4: >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> (ReplicatedPG::do_pg_op(std::tr1::shared_ptr<OpRequest>)+ >> >>>>>>>>>>>>>>>>>>>> 0xea5) >> >>>>>>>>>>>>>>>>>>>> [0x7f67f5] >> >>>>>>>>>>>>>>>>>>>> 5: >> >>>>>>>>>>>>>>>>>>>> (ReplicatedPG::do_op(std::tr1: >> >>>>>>>>>>>>>>>>>>>> :shared_ptr<OpRequest>)+0x1f3) >> >>>>>>>>>>>>>>>>>>>> [0x8177b3] >> >>>>>>>>>>>>>>>>>>>> 6: (ReplicatedPG::do_request(std: >> >>>>>>>>>>>>>>>>>>>> :tr1::shared_ptr<OpRequest>, >> >>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x5d5) [0x7b8045] >> >>>>>>>>>>>>>>>>>>>> 7: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >> >>>>>>>>>>>>>>>>>>>> std::tr1::shared_ptr<OpRequest>, >> >>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x47d) >> >>>>>>>>>>>>>>>>>>>> [0x62bf8d] >> >>>>>>>>>>>>>>>>>>>> 8: (OSD::ShardedOpWQ::_process(unsigned int, >> >>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x35c) [0x62c56c] >> >>>>>>>>>>>>>>>>>>>> 9: >> (ShardedThreadPool::shardedthreadpool_worker(unsigned >> >>>>>>>>>>>>>>>>>>>> int)+0x8cd) >> >>>>>>>>>>>>>>>>>>>> [0xa776fd] >> >>>>>>>>>>>>>>>>>>>> 10: >> (ShardedThreadPool::WorkThreadSharded::entry()+0x10) >> >>>>>>>>>>>>>>>>>>>> [0xa79980] >> >>>>>>>>>>>>>>>>>>>> 11: (()+0x7df3) [0x7f8aac71fdf3] >> >>>>>>>>>>>>>>>>>>>> 12: (clone()+0x6d) [0x7f8aab1963dd] >> >>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS >> >>>>>>>>>>>>>>>>>>>> <executable>` >> >>>>>>>>>>>>>>>>>>>> is >> >>>>>>>>>>>>>>>>>>>> needed >> >>>>>>>>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>>>>> interpret this. >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> ceph version 0.83 (78ff1f0a5dfd3c5850805b40217385 >> >>>>>>>>>>>>>>>>>>>> 64c36c92b8) >> >>>>>>>>>>>>>>>>>>>> 1: /usr/bin/ceph-osd() [0x99b466] >> >>>>>>>>>>>>>>>>>>>> 2: (()+0xf130) [0x7f8aac727130] >> >>>>>>>>>>>>>>>>>>>> 3: (gsignal()+0x39) [0x7f8aab0d5989] >> >>>>>>>>>>>>>>>>>>>> 4: (abort()+0x148) [0x7f8aab0d7098] >> >>>>>>>>>>>>>>>>>>>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) >> >>>>>>>>>>>>>>>>>>>> [0x7f8aab9e89d5] >> >>>>>>>>>>>>>>>>>>>> 6: (()+0x5e946) [0x7f8aab9e6946] >> >>>>>>>>>>>>>>>>>>>> 7: (()+0x5e973) [0x7f8aab9e6973] >> >>>>>>>>>>>>>>>>>>>> 8: (()+0x5eb9f) [0x7f8aab9e6b9f] >> >>>>>>>>>>>>>>>>>>>> 9: (ceph::__ceph_assert_fail(char const*, char >> const*, >> >>>>>>>>>>>>>>>>>>>> int, >> >>>>>>>>>>>>>>>>>>>> char >> >>>>>>>>>>>>>>>>>>>> const*)+0x1ef) [0xa8805f] >> >>>>>>>>>>>>>>>>>>>> 10: (GenericObjectMap::list_objects(coll_t const&, >> >>>>>>>>>>>>>>>>>>>> ghobject_t, >> >>>>>>>>>>>>>>>>>>>> int, >> >>>>>>>>>>>>>>>>>>>> std::vector<ghobject_t, std::allocator<ghobject_t> >> >*, >> >>>>>>>>>>>>>>>>>>>> ghobject_t*)+0x474) >> >>>>>>>>>>>>>>>>>>>> [0x98f774] >> >>>>>>>>>>>>>>>>>>>> 11: (KeyValueStore::collection_list_partial(coll_t, >> >>>>>>>>>>>>>>>>>>>> ghobject_t, >> >>>>>>>>>>>>>>>>>>>> int, >> >>>>>>>>>>>>>>>>>>>> int, >> >>>>>>>>>>>>>>>>>>>> snapid_t, std::vector<ghobject_t, >> >>>>>>>>>>>>>>>>>>>> std::allocator<ghobject_t> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> *, >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> ghobject_t*)+0x274) [0x8c5b54] >> >>>>>>>>>>>>>>>>>>>> 12: (PGBackend::objects_list_partial(hobject_t >> const&, >> >>>>>>>>>>>>>>>>>>>> int, >> >>>>>>>>>>>>>>>>>>>> int, >> >>>>>>>>>>>>>>>>>>>> snapid_t, >> >>>>>>>>>>>>>>>>>>>> std::vector<hobject_t, std::allocator<hobject_t> >*, >> >>>>>>>>>>>>>>>>>>>> hobject_t*)+0x1c9) >> >>>>>>>>>>>>>>>>>>>> [0x862de9] >> >>>>>>>>>>>>>>>>>>>> 13: >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> (ReplicatedPG::do_pg_op(std::tr1::shared_ptr<OpRequest>)+ >> >>>>>>>>>>>>>>>>>>>> 0xea5) >> >>>>>>>>>>>>>>>>>>>> [0x7f67f5] >> >>>>>>>>>>>>>>>>>>>> 14: >> >>>>>>>>>>>>>>>>>>>> (ReplicatedPG::do_op(std::tr1: >> >>>>>>>>>>>>>>>>>>>> :shared_ptr<OpRequest>)+0x1f3) >> >>>>>>>>>>>>>>>>>>>> [0x8177b3] >> >>>>>>>>>>>>>>>>>>>> 15: >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, >> >>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x5d5) [0x7b8045] >> >>>>>>>>>>>>>>>>>>>> 16: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >> >>>>>>>>>>>>>>>>>>>> std::tr1::shared_ptr<OpRequest>, >> >>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x47d) >> >>>>>>>>>>>>>>>>>>>> [0x62bf8d] >> >>>>>>>>>>>>>>>>>>>> 17: (OSD::ShardedOpWQ::_process(unsigned int, >> >>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x35c) [0x62c56c] >> >>>>>>>>>>>>>>>>>>>> 18: >> >>>>>>>>>>>>>>>>>>>> (ShardedThreadPool::shardedthreadpool_worker(unsigned >> >>>>>>>>>>>>>>>>>>>> int)+0x8cd) >> >>>>>>>>>>>>>>>>>>>> [0xa776fd] >> >>>>>>>>>>>>>>>>>>>> 19: >> (ShardedThreadPool::WorkThreadSharded::entry()+0x10) >> >>>>>>>>>>>>>>>>>>>> [0xa79980] >> >>>>>>>>>>>>>>>>>>>> 20: (()+0x7df3) [0x7f8aac71fdf3] >> >>>>>>>>>>>>>>>>>>>> 21: (clone()+0x6d) [0x7f8aab1963dd] >> >>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS >> >>>>>>>>>>>>>>>>>>>> <executable>` >> >>>>>>>>>>>>>>>>>>>> is >> >>>>>>>>>>>>>>>>>>>> needed >> >>>>>>>>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>>>>> interpret this. >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> --- begin dump of recent events --- >> >>>>>>>>>>>>>>>>>>>> 0> 2014-08-13 17:52:56.714214 7f8a92f9a700 -1 *** >> >>>>>>>>>>>>>>>>>>>> Caught >> >>>>>>>>>>>>>>>>>>>> signal >> >>>>>>>>>>>>>>>>>>>> (Aborted) ** >> >>>>>>>>>>>>>>>>>>>> in thread 7f8a92f9a700 >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> ceph version 0.83 (78ff1f0a5dfd3c5850805b40217385 >> >>>>>>>>>>>>>>>>>>>> 64c36c92b8) >> >>>>>>>>>>>>>>>>>>>> 1: /usr/bin/ceph-osd() [0x99b466] >> >>>>>>>>>>>>>>>>>>>> 2: (()+0xf130) [0x7f8aac727130] >> >>>>>>>>>>>>>>>>>>>> 3: (gsignal()+0x39) [0x7f8aab0d5989] >> >>>>>>>>>>>>>>>>>>>> 4: (abort()+0x148) [0x7f8aab0d7098] >> >>>>>>>>>>>>>>>>>>>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) >> >>>>>>>>>>>>>>>>>>>> [0x7f8aab9e89d5] >> >>>>>>>>>>>>>>>>>>>> 6: (()+0x5e946) [0x7f8aab9e6946] >> >>>>>>>>>>>>>>>>>>>> 7: (()+0x5e973) [0x7f8aab9e6973] >> >>>>>>>>>>>>>>>>>>>> 8: (()+0x5eb9f) [0x7f8aab9e6b9f] >> >>>>>>>>>>>>>>>>>>>> 9: (ceph::__ceph_assert_fail(char const*, char >> const*, >> >>>>>>>>>>>>>>>>>>>> int, >> >>>>>>>>>>>>>>>>>>>> char >> >>>>>>>>>>>>>>>>>>>> const*)+0x1ef) [0xa8805f] >> >>>>>>>>>>>>>>>>>>>> 10: (GenericObjectMap::list_objects(coll_t const&, >> >>>>>>>>>>>>>>>>>>>> ghobject_t, >> >>>>>>>>>>>>>>>>>>>> int, >> >>>>>>>>>>>>>>>>>>>> std::vector<ghobject_t, std::allocator<ghobject_t> >> >*, >> >>>>>>>>>>>>>>>>>>>> ghobject_t*)+0x474) >> >>>>>>>>>>>>>>>>>>>> [0x98f774] >> >>>>>>>>>>>>>>>>>>>> 11: (KeyValueStore::collection_list_partial(coll_t, >> >>>>>>>>>>>>>>>>>>>> ghobject_t, >> >>>>>>>>>>>>>>>>>>>> int, >> >>>>>>>>>>>>>>>>>>>> int, >> >>>>>>>>>>>>>>>>>>>> snapid_t, std::vector<ghobject_t, >> >>>>>>>>>>>>>>>>>>>> std::allocator<ghobject_t> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> *, >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> ghobject_t*)+0x274) [0x8c5b54] >> >>>>>>>>>>>>>>>>>>>> 12: (PGBackend::objects_list_partial(hobject_t >> const&, >> >>>>>>>>>>>>>>>>>>>> int, >> >>>>>>>>>>>>>>>>>>>> int, >> >>>>>>>>>>>>>>>>>>>> snapid_t, >> >>>>>>>>>>>>>>>>>>>> std::vector<hobject_t, std::allocator<hobject_t> >*, >> >>>>>>>>>>>>>>>>>>>> hobject_t*)+0x1c9) >> >>>>>>>>>>>>>>>>>>>> [0x862de9] >> >>>>>>>>>>>>>>>>>>>> 13: >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> (ReplicatedPG::do_pg_op(std::tr1::shared_ptr<OpRequest>)+ >> >>>>>>>>>>>>>>>>>>>> 0xea5) >> >>>>>>>>>>>>>>>>>>>> [0x7f67f5] >> >>>>>>>>>>>>>>>>>>>> 14: >> >>>>>>>>>>>>>>>>>>>> (ReplicatedPG::do_op(std::tr1: >> >>>>>>>>>>>>>>>>>>>> :shared_ptr<OpRequest>)+0x1f3) >> >>>>>>>>>>>>>>>>>>>> [0x8177b3] >> >>>>>>>>>>>>>>>>>>>> 15: >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, >> >>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x5d5) [0x7b8045] >> >>>>>>>>>>>>>>>>>>>> 16: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >> >>>>>>>>>>>>>>>>>>>> std::tr1::shared_ptr<OpRequest>, >> >>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x47d) >> >>>>>>>>>>>>>>>>>>>> [0x62bf8d] >> >>>>>>>>>>>>>>>>>>>> 17: (OSD::ShardedOpWQ::_process(unsigned int, >> >>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x35c) [0x62c56c] >> >>>>>>>>>>>>>>>>>>>> 18: >> >>>>>>>>>>>>>>>>>>>> (ShardedThreadPool::shardedthreadpool_worker(unsigned >> >>>>>>>>>>>>>>>>>>>> int)+0x8cd) >> >>>>>>>>>>>>>>>>>>>> [0xa776fd] >> >>>>>>>>>>>>>>>>>>>> 19: >> (ShardedThreadPool::WorkThreadSharded::entry()+0x10) >> >>>>>>>>>>>>>>>>>>>> [0xa79980] >> >>>>>>>>>>>>>>>>>>>> 20: (()+0x7df3) [0x7f8aac71fdf3] >> >>>>>>>>>>>>>>>>>>>> 21: (clone()+0x6d) [0x7f8aab1963dd] >> >>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS >> >>>>>>>>>>>>>>>>>>>> <executable>` >> >>>>>>>>>>>>>>>>>>>> is >> >>>>>>>>>>>>>>>>>>>> needed >> >>>>>>>>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>>>>> interpret this. >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> I guess this has something to do with using the dev >> >>>>>>>>>>>>>>>>>>>> Keyvaluestore? >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> Thanks! >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> Kenneth >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> _______________________________________________ >> >>>>>>>>>>>>>>>>>>>> ceph-users mailing list >> >>>>>>>>>>>>>>>>>>>> ceph-users at lists.ceph.com >> >>>>>>>>>>>>>>>>>>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> -- >> >>>>>>>>>>>>>>>>>>> Best Regards, >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> Wheat >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> ----- End message from Haomai Wang < >> haomaiwang at gmail.com> >> >>>>>>>>>>>>>>>>>> ----- >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> -- >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> Met vriendelijke groeten, >> >>>>>>>>>>>>>>>>>> Kenneth Waegeman >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> -- >> >>>>>>>>>>>>>>>>> Best Regards, >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Wheat >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> ----- End message from Haomai Wang <haomaiwang at gmail.com >> > >> >>>>>>>>>>>>>>>> ----- >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> -- >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> Met vriendelijke groeten, >> >>>>>>>>>>>>>>>> Kenneth Waegeman >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> -- >> >>>>>>>>>>>>>>> Best Regards, >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> Wheat >> >>>>>>>>>>>>>>> _______________________________________________ >> >>>>>>>>>>>>>>> ceph-users mailing list >> >>>>>>>>>>>>>>> ceph-users at lists.ceph.com >> >>>>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> ----- End message from Sage Weil <sweil at redhat.com> ----- >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> -- >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Met vriendelijke groeten, >> > >> > >> >>>>>>>>>>>>> Kenneth Waegeman >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> -- >> >>>>>>>>>>>> Best Regards, >> >>>>>>>>>>>> >> >>>>>>>>>>>> Wheat >> >>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> ----- End message from Haomai Wang <haomaiwang at gmail.com> >> ----- >> >>>>>>>>>>> >> >>>>>>>>>>> -- >> >>>>>>>>>>> >> >>>>>>>>>>> Met vriendelijke groeten, >> >>>>>>>>>>> Kenneth Waegeman >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> -- >> >>>>>>>>>> Best Regards, >> >>>>>>>>>> >> >>>>>>>>>> Wheat >> >>>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> ----- End message from Haomai Wang <haomaiwang at gmail.com> ----- >> >>>>>>>>> >> >>>>>>>>> -- >> >>>>>>>>> >> >>>>>>>>> Met vriendelijke groeten, >> >>>>>>>>> Kenneth Waegeman >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> -- >> >>>>>>>> Best Regards, >> >>>>>>>> >> >>>>>>>> Wheat >> >>>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> -- >> >>>>>>> Best Regards, >> >>>>>>> >> >>>>>>> Wheat >> >>>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> ----- End message from Haomai Wang <haomaiwang at gmail.com> ----- >> >>>>>> >> >>>>>> -- >> >>>>>> >> >>>>>> Met vriendelijke groeten, >> >>>>>> Kenneth Waegeman >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>> >> >>>>> >> >>>>> -- >> >>>>> Best Regards, >> >>>>> >> >>>>> Wheat >> >>>>> >> >>>> >> >>>> >> >>>> ----- End message from Haomai Wang <haomaiwang at gmail.com> ----- >> >>>> >> >>>> -- >> >>>> >> >>>> Met vriendelijke groeten, >> >>>> Kenneth Waegeman >> >>>> >> >>>> >> >>>> >> >>> >> >>> >> >>> -- >> >>> >> >>> Best Regards, >> >>> >> >>> Wheat >> >> >> >> >> >> >> >> ----- End message from Haomai Wang <haomaiwang at gmail.com> ----- >> >> >> >> -- >> >> >> >> Met vriendelijke groeten, >> >> Kenneth Waegeman >> > >> > >> > >> > ----- End message from Kenneth Waegeman <Kenneth.Waegeman at UGent.be> >> ----- >> > >> > >> > -- >> > >> > Met vriendelijke groeten, >> > Kenneth Waegeman >> > >> >> >> >> -- >> Best Regards, >> >> Wheat >> > > > > -- > > Best Regards, > > Wheat > -- Best Regards, Wheat -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140907/82d9ea01/attachment.htm>