ceph cluster inconsistency keyvaluestore

haomaiwang@xxxxxxxxx (Haomai Wang) · Sun, 7 Sep 2014 20:00:48 +0800

I have found the root cause. It's a bug.

When chunky scrub happen, it will iterate the who pg's objects and each
iterator only a few objects will be scan.

osd/PG.cc:3758
            ret = get_pgbackend()-> objects_list_partial(
      start,
      cct->_conf->osd_scrub_chunk_min,
      cct->_conf->osd_scrub_chunk_max,
      0,
      &objects,
      &candidate_end);

candidate_end is the end of object set and it's used to indicate the next
scrub process's start position. But it will be truncated:

osd/PG.cc:3777
            while (!boundary_found && objects.size() > 1) {
              hobject_t end = objects.back().get_boundary();
              objects.pop_back();

              if (objects.back().get_filestore_key() !=
end.get_filestore_key()) {
                candidate_end = end;
                boundary_found = true;
              }
            }
end which only contain "hash" field as hobject_t will be assign to
candidate_end.  So the next scrub process a hobject_t only contains "hash"
field will be passed in to get_pgbackend()-> objects_list_partial.

It will cause incorrect results for KeyValueStore backend. Because it will
use strict key ordering for "collection_list_paritial" method. A hobject_t
only contains "hash" field will be:

1%e79s0_head!972F1B5D!!none!!!00000000000000000000!0!0

and the actual object is
1%e79s0_head!972F1B5D!!1!!!object-name!head

In other word, a object only contain "hash" field can't used by to search a
absolute object has the same "hash" field.

@sage, I simply scan the usage of "get_boundary" and can't find the reason.
Could we simply remove it and the results will be:

            while (!boundary_found && objects.size() > 1) {
              hobject_t end = objects.back();
              objects.pop_back();

              if (objects.back().get_filestore_key() !=
end.get_filestore_key()) {
                candidate_end = end;
                boundary_found = true;
              }
            }

On Sat, Sep 6, 2014 at 10:44 PM, Haomai Wang <haomaiwang at gmail.com> wrote:

> Sorry for the late message, I'm back from a short vacation. I would
> like to try it this weekends. Thanks for your patient :-)
>
> On Wed, Sep 3, 2014 at 9:16 PM, Kenneth Waegeman
> <Kenneth.Waegeman at ugent.be> wrote:
> > I also can reproduce it on a new slightly different set up (also EC on KV
> > and Cache) by running ceph pg scrub on a KV pg: this pg will then get the
> > 'inconsistent' status
> >
> >
> >
> > ----- Message from Kenneth Waegeman <Kenneth.Waegeman at UGent.be>
> ---------
> >    Date: Mon, 01 Sep 2014 16:28:31 +0200
> >    From: Kenneth Waegeman <Kenneth.Waegeman at UGent.be>
> > Subject: Re: ceph cluster inconsistency keyvaluestore
> >      To: Haomai Wang <haomaiwang at gmail.com>
> >      Cc: ceph-users at lists.ceph.com
> >
> >
> >
> >> Hi,
> >>
> >>
> >> The cluster got installed with quattor, which uses ceph-deploy for
> >> installation of daemons, writes the config file and installs the
> crushmap.
> >> I have 3 hosts, each 12 disks, having a large KV partition (3.6T) for
> the
> >> ECdata pool and a small cache partition (50G) for the cache
> >>
> >> I manually did this:
> >>
> >> ceph osd pool create cache 1024 1024
> >> ceph osd pool set cache size 2
> >> ceph osd pool set cache min_size 1
> >> ceph osd erasure-code-profile set profile11 k=8 m=3
> >> ruleset-failure-domain=osd
> >> ceph osd pool create ecdata 128 128 erasure profile11
> >> ceph osd tier add ecdata cache
> >> ceph osd tier cache-mode cache writeback
> >> ceph osd tier set-overlay ecdata cache
> >> ceph osd pool set cache hit_set_type bloom
> >> ceph osd pool set cache hit_set_count 1
> >> ceph osd pool set cache hit_set_period 3600
> >> ceph osd pool set cache target_max_bytes $((280*1024*1024*1024))
> >>
> >> (But the previous time I had the problem already without the cache part)
> >>
> >>
> >>
> >> Cluster live since 2014-08-29 15:34:16
> >>
> >> Config file on host ceph001:
> >>
> >> [global]
> >> auth_client_required = cephx
> >> auth_cluster_required = cephx
> >> auth_service_required = cephx
> >> cluster_network = 10.143.8.0/24
> >> filestore_xattr_use_omap = 1
> >> fsid = 82766e04-585b-49a6-a0ac-c13d9ffd0a7d
> >> mon_cluster_log_to_syslog = 1
> >> mon_host = ceph001.cubone.os, ceph002.cubone.os, ceph003.cubone.os
> >> mon_initial_members = ceph001, ceph002, ceph003
> >> osd_crush_update_on_start = 0
> >> osd_journal_size = 10240
> >> osd_pool_default_min_size = 2
> >> osd_pool_default_pg_num = 512
> >> osd_pool_default_pgp_num = 512
> >> osd_pool_default_size = 3
> >> public_network = 10.141.8.0/24
> >>
> >> [osd.11]
> >> osd_objectstore = keyvaluestore-dev
> >>
> >> [osd.13]
> >> osd_objectstore = keyvaluestore-dev
> >>
> >> [osd.15]
> >> osd_objectstore = keyvaluestore-dev
> >>
> >> [osd.17]
> >> osd_objectstore = keyvaluestore-dev
> >>
> >> [osd.19]
> >> osd_objectstore = keyvaluestore-dev
> >>
> >> [osd.21]
> >> osd_objectstore = keyvaluestore-dev
> >>
> >> [osd.23]
> >> osd_objectstore = keyvaluestore-dev
> >>
> >> [osd.25]
> >> osd_objectstore = keyvaluestore-dev
> >>
> >> [osd.3]
> >> osd_objectstore = keyvaluestore-dev
> >>
> >> [osd.5]
> >> osd_objectstore = keyvaluestore-dev
> >>
> >> [osd.7]
> >> osd_objectstore = keyvaluestore-dev
> >>
> >> [osd.9]
> >> osd_objectstore = keyvaluestore-dev
> >>
> >>
> >> OSDs:
> >> # id    weight  type name       up/down reweight
> >> -12     140.6   root default-cache
> >> -9      46.87           host ceph001-cache
> >> 2       3.906                   osd.2   up      1
> >> 4       3.906                   osd.4   up      1
> >> 6       3.906                   osd.6   up      1
> >> 8       3.906                   osd.8   up      1
> >> 10      3.906                   osd.10  up      1
> >> 12      3.906                   osd.12  up      1
> >> 14      3.906                   osd.14  up      1
> >> 16      3.906                   osd.16  up      1
> >> 18      3.906                   osd.18  up      1
> >> 20      3.906                   osd.20  up      1
> >> 22      3.906                   osd.22  up      1
> >> 24      3.906                   osd.24  up      1
> >> -10     46.87           host ceph002-cache
> >> 28      3.906                   osd.28  up      1
> >> 30      3.906                   osd.30  up      1
> >> 32      3.906                   osd.32  up      1
> >> 34      3.906                   osd.34  up      1
> >> 36      3.906                   osd.36  up      1
> >> 38      3.906                   osd.38  up      1
> >> 40      3.906                   osd.40  up      1
> >> 42      3.906                   osd.42  up      1
> >> 44      3.906                   osd.44  up      1
> >> 46      3.906                   osd.46  up      1
> >> 48      3.906                   osd.48  up      1
> >> 50      3.906                   osd.50  up      1
> >> -11     46.87           host ceph003-cache
> >> 54      3.906                   osd.54  up      1
> >> 56      3.906                   osd.56  up      1
> >> 58      3.906                   osd.58  up      1
> >> 60      3.906                   osd.60  up      1
> >> 62      3.906                   osd.62  up      1
> >> 64      3.906                   osd.64  up      1
> >> 66      3.906                   osd.66  up      1
> >> 68      3.906                   osd.68  up      1
> >> 70      3.906                   osd.70  up      1
> >> 72      3.906                   osd.72  up      1
> >> 74      3.906                   osd.74  up      1
> >> 76      3.906                   osd.76  up      1
> >> -8      140.6   root default-ec
> >> -5      46.87           host ceph001-ec
> >> 3       3.906                   osd.3   up      1
> >> 5       3.906                   osd.5   up      1
> >> 7       3.906                   osd.7   up      1
> >> 9       3.906                   osd.9   up      1
> >> 11      3.906                   osd.11  up      1
> >> 13      3.906                   osd.13  up      1
> >> 15      3.906                   osd.15  up      1
> >> 17      3.906                   osd.17  up      1
> >> 19      3.906                   osd.19  up      1
> >> 21      3.906                   osd.21  up      1
> >> 23      3.906                   osd.23  up      1
> >> 25      3.906                   osd.25  up      1
> >> -6      46.87           host ceph002-ec
> >> 29      3.906                   osd.29  up      1
> >> 31      3.906                   osd.31  up      1
> >> 33      3.906                   osd.33  up      1
> >> 35      3.906                   osd.35  up      1
> >> 37      3.906                   osd.37  up      1
> >> 39      3.906                   osd.39  up      1
> >> 41      3.906                   osd.41  up      1
> >> 43      3.906                   osd.43  up      1
> >> 45      3.906                   osd.45  up      1
> >> 47      3.906                   osd.47  up      1
> >> 49      3.906                   osd.49  up      1
> >> 51      3.906                   osd.51  up      1
> >> -7      46.87           host ceph003-ec
> >> 55      3.906                   osd.55  up      1
> >> 57      3.906                   osd.57  up      1
> >> 59      3.906                   osd.59  up      1
> >> 61      3.906                   osd.61  up      1
> >> 63      3.906                   osd.63  up      1
> >> 65      3.906                   osd.65  up      1
> >> 67      3.906                   osd.67  up      1
> >> 69      3.906                   osd.69  up      1
> >> 71      3.906                   osd.71  up      1
> >> 73      3.906                   osd.73  up      1
> >> 75      3.906                   osd.75  up      1
> >> 77      3.906                   osd.77  up      1
> >> -4      23.44   root default-ssd
> >> -1      7.812           host ceph001-ssd
> >> 0       3.906                   osd.0   up      1
> >> 1       3.906                   osd.1   up      1
> >> -2      7.812           host ceph002-ssd
> >> 26      3.906                   osd.26  up      1
> >> 27      3.906                   osd.27  up      1
> >> -3      7.812           host ceph003-ssd
> >> 52      3.906                   osd.52  up      1
> >> 53      3.906                   osd.53  up      1
> >>
> >> Cache OSDs are each 50G, the EC KV OSDS 3.6T, (ssds not used right now)
> >>
> >> Pools:
> >> pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash
> >> rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool
> stripe_width 0
> >> pool 1 'cache' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> >> rjenkins pg_num 1024 pgp_num 1024 last_change 174 flags
> >> hashpspool,incomplete_clones tier_of 2 cache_mode writeback target_bytes
> >> 300647710720 hit_set bloom{false_positive_probability: 0.05,
> target_size: 0,
> >> seed: 0} 3600s x1 stripe_width 0
> >> pool 2 'ecdata' erasure size 11 min_size 8 crush_ruleset 2 object_hash
> >> rjenkins pg_num 128 pgp_num 128 last_change 170 lfor 170 flags
> hashpspool
> >> tiers 1 read_tier 1 write_tier 1 stripe_width 4096
> >>
> >>
> >> Crushmap:
> >> # begin crush map
> >> tunable choose_local_fallback_tries 0
> >> tunable choose_local_tries 0
> >> tunable choose_total_tries 50
> >> tunable chooseleaf_descend_once 1
> >>
> >> # devices
> >> device 0 osd.0
> >> device 1 osd.1
> >> device 2 osd.2
> >> device 3 osd.3
> >> device 4 osd.4
> >> device 5 osd.5
> >> device 6 osd.6
> >> device 7 osd.7
> >> device 8 osd.8
> >> device 9 osd.9
> >> device 10 osd.10
> >> device 11 osd.11
> >> device 12 osd.12
> >> device 13 osd.13
> >> device 14 osd.14
> >> device 15 osd.15
> >> device 16 osd.16
> >> device 17 osd.17
> >> device 18 osd.18
> >> device 19 osd.19
> >> device 20 osd.20
> >> device 21 osd.21
> >> device 22 osd.22
> >> device 23 osd.23
> >> device 24 osd.24
> >> device 25 osd.25
> >> device 26 osd.26
> >> device 27 osd.27
> >> device 28 osd.28
> >> device 29 osd.29
> >> device 30 osd.30
> >> device 31 osd.31
> >> device 32 osd.32
> >> device 33 osd.33
> >> device 34 osd.34
> >> device 35 osd.35
> >> device 36 osd.36
> >> device 37 osd.37
> >> device 38 osd.38
> >> device 39 osd.39
> >> device 40 osd.40
> >> device 41 osd.41
> >> device 42 osd.42
> >> device 43 osd.43
> >> device 44 osd.44
> >> device 45 osd.45
> >> device 46 osd.46
> >> device 47 osd.47
> >> device 48 osd.48
> >> device 49 osd.49
> >> device 50 osd.50
> >> device 51 osd.51
> >> device 52 osd.52
> >> device 53 osd.53
> >> device 54 osd.54
> >> device 55 osd.55
> >> device 56 osd.56
> >> device 57 osd.57
> >> device 58 osd.58
> >> device 59 osd.59
> >> device 60 osd.60
> >> device 61 osd.61
> >> device 62 osd.62
> >> device 63 osd.63
> >> device 64 osd.64
> >> device 65 osd.65
> >> device 66 osd.66
> >> device 67 osd.67
> >> device 68 osd.68
> >> device 69 osd.69
> >> device 70 osd.70
> >> device 71 osd.71
> >> device 72 osd.72
> >> device 73 osd.73
> >> device 74 osd.74
> >> device 75 osd.75
> >> device 76 osd.76
> >> device 77 osd.77
> >>
> >> # types
> >> type 0 osd
> >> type 1 host
> >> type 2 root
> >>
> >> # buckets
> >> host ceph001-ssd {
> >>         id -1           # do not change unnecessarily
> >>         # weight 7.812
> >>         alg straw
> >>         hash 0  # rjenkins1
> >>         item osd.0 weight 3.906
> >>         item osd.1 weight 3.906
> >> }
> >> host ceph002-ssd {
> >>         id -2           # do not change unnecessarily
> >>         # weight 7.812
> >>         alg straw
> >>         hash 0  # rjenkins1
> >>         item osd.26 weight 3.906
> >>         item osd.27 weight 3.906
> >> }
> >> host ceph003-ssd {
> >>         id -3           # do not change unnecessarily
> >>         # weight 7.812
> >>         alg straw
> >>         hash 0  # rjenkins1
> >>         item osd.52 weight 3.906
> >>         item osd.53 weight 3.906
> >> }
> >> root default-ssd {
> >>         id -4           # do not change unnecessarily
> >>         # weight 23.436
> >>         alg straw
> >>         hash 0  # rjenkins1
> >>         item ceph001-ssd weight 7.812
> >>         item ceph002-ssd weight 7.812
> >>         item ceph003-ssd weight 7.812
> >> }
> >> host ceph001-ec {
> >>         id -5           # do not change unnecessarily
> >>         # weight 46.872
> >>         alg straw
> >>         hash 0  # rjenkins1
> >>         item osd.3 weight 3.906
> >>         item osd.5 weight 3.906
> >>         item osd.7 weight 3.906
> >>         item osd.9 weight 3.906
> >>         item osd.11 weight 3.906
> >>         item osd.13 weight 3.906
> >>         item osd.15 weight 3.906
> >>         item osd.17 weight 3.906
> >>         item osd.19 weight 3.906
> >>         item osd.21 weight 3.906
> >>         item osd.23 weight 3.906
> >>         item osd.25 weight 3.906
> >> }
> >> host ceph002-ec {
> >>         id -6           # do not change unnecessarily
> >>         # weight 46.872
> >>         alg straw
> >>         hash 0  # rjenkins1
> >>         item osd.29 weight 3.906
> >>         item osd.31 weight 3.906
> >>         item osd.33 weight 3.906
> >>         item osd.35 weight 3.906
> >>         item osd.37 weight 3.906
> >>         item osd.39 weight 3.906
> >>         item osd.41 weight 3.906
> >>         item osd.43 weight 3.906
> >>         item osd.45 weight 3.906
> >>         item osd.47 weight 3.906
> >>         item osd.49 weight 3.906
> >>         item osd.51 weight 3.906
> >> }
> >> host ceph003-ec {
> >>         id -7           # do not change unnecessarily
> >>         # weight 46.872
> >>         alg straw
> >>         hash 0  # rjenkins1
> >>         item osd.55 weight 3.906
> >>         item osd.57 weight 3.906
> >>         item osd.59 weight 3.906
> >>         item osd.61 weight 3.906
> >>         item osd.63 weight 3.906
> >>         item osd.65 weight 3.906
> >>         item osd.67 weight 3.906
> >>         item osd.69 weight 3.906
> >>         item osd.71 weight 3.906
> >>         item osd.73 weight 3.906
> >>         item osd.75 weight 3.906
> >>         item osd.77 weight 3.906
> >> }
> >> root default-ec {
> >>         id -8           # do not change unnecessarily
> >>         # weight 140.616
> >>         alg straw
> >>         hash 0  # rjenkins1
> >>         item ceph001-ec weight 46.872
> >>         item ceph002-ec weight 46.872
> >>         item ceph003-ec weight 46.872
> >> }
> >> host ceph001-cache {
> >>         id -9           # do not change unnecessarily
> >>         # weight 46.872
> >>         alg straw
> >>         hash 0  # rjenkins1
> >>         item osd.2 weight 3.906
> >>         item osd.4 weight 3.906
> >>         item osd.6 weight 3.906
> >>         item osd.8 weight 3.906
> >>         item osd.10 weight 3.906
> >>         item osd.12 weight 3.906
> >>         item osd.14 weight 3.906
> >>         item osd.16 weight 3.906
> >>         item osd.18 weight 3.906
> >>         item osd.20 weight 3.906
> >>         item osd.22 weight 3.906
> >>         item osd.24 weight 3.906
> >> }
> >> host ceph002-cache {
> >>         id -10          # do not change unnecessarily
> >>         # weight 46.872
> >>         alg straw
> >>         hash 0  # rjenkins1
> >>         item osd.28 weight 3.906
> >>         item osd.30 weight 3.906
> >>         item osd.32 weight 3.906
> >>         item osd.34 weight 3.906
> >>         item osd.36 weight 3.906
> >>         item osd.38 weight 3.906
> >>         item osd.40 weight 3.906
> >>         item osd.42 weight 3.906
> >>         item osd.44 weight 3.906
> >>         item osd.46 weight 3.906
> >>         item osd.48 weight 3.906
> >>         item osd.50 weight 3.906
> >> }
> >> host ceph003-cache {
> >>         id -11          # do not change unnecessarily
> >>         # weight 46.872
> >>         alg straw
> >>         hash 0  # rjenkins1
> >>         item osd.54 weight 3.906
> >>         item osd.56 weight 3.906
> >>         item osd.58 weight 3.906
> >>         item osd.60 weight 3.906
> >>         item osd.62 weight 3.906
> >>         item osd.64 weight 3.906
> >>         item osd.66 weight 3.906
> >>         item osd.68 weight 3.906
> >>         item osd.70 weight 3.906
> >>         item osd.72 weight 3.906
> >>         item osd.74 weight 3.906
> >>         item osd.76 weight 3.906
> >> }
> >> root default-cache {
> >>         id -12          # do not change unnecessarily
> >>         # weight 140.616
> >>         alg straw
> >>         hash 0  # rjenkins1
> >>         item ceph001-cache weight 46.872
> >>         item ceph002-cache weight 46.872
> >>         item ceph003-cache weight 46.872
> >> }
> >>
> >> # rules
> >> rule cache {
> >>         ruleset 0
> >>         type replicated
> >>         min_size 1
> >>         max_size 10
> >>         step take default-cache
> >>         step chooseleaf firstn 0 type host
> >>         step emit
> >> }
> >> rule metadata {
> >>         ruleset 1
> >>         type replicated
> >>         min_size 1
> >>         max_size 10
> >>         step take default-ssd
> >>         step chooseleaf firstn 0 type host
> >>         step emit
> >> }
> >> rule ecdata {
> >>         ruleset 2
> >>         type erasure
> >>         min_size 3
> >>         max_size 20
> >>         step set_chooseleaf_tries 5
> >>         step take default-ec
> >>         step choose indep 0 type osd
> >>         step emit
> >> }
> >>
> >> # end crush map
> >>
> >> The benchmarks I then did:
> >>
> >> ./benchrw 50000
> >>
> >> benchrw:
> >> /usr/bin/rados -p ecdata bench $1 write --no-cleanup
> >> /usr/bin/rados -p ecdata bench $1 seq
> >> /usr/bin/rados -p ecdata bench $1 seq &
> >> /usr/bin/rados -p ecdata bench $1 write --no-cleanup
> >>
> >>
> >> Srubbing errors started soon after that: 2014-08-31 10:59:14
> >>
> >>
> >> Please let me know if you need more information, and thanks !
> >>
> >> Kenneth
> >>
> >> ----- Message from Haomai Wang <haomaiwang at gmail.com> ---------
> >>    Date: Mon, 1 Sep 2014 21:30:16 +0800
> >>    From: Haomai Wang <haomaiwang at gmail.com>
> >> Subject: Re: ceph cluster inconsistency keyvaluestore
> >>      To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be>
> >>      Cc: ceph-users at lists.ceph.com
> >>
> >>
> >>> Hmm, could you please list your instructions including cluster existing
> >>> time and all relevant ops? I want to reproduce it.
> >>>
> >>>
> >>> On Mon, Sep 1, 2014 at 4:45 PM, Kenneth Waegeman
> >>> <Kenneth.Waegeman at ugent.be>
> >>> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I reinstalled the cluster with 0.84, and tried again running rados
> bench
> >>>> on a EC coded pool on keyvaluestore.
> >>>> Nothing crashed this time, but when I check the status:
> >>>>
> >>>>     health HEALTH_ERR 128 pgs inconsistent; 128 scrub errors; too few
> >>>> pgs
> >>>> per osd (15 < min 20)
> >>>>     monmap e1: 3 mons at {ceph001=10.141.8.180:6789/0,
> >>>> ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0}, election
> epoch
> >>>> 8, quorum 0,1,2 ceph001,ceph002,ceph003
> >>>>     osdmap e174: 78 osds: 78 up, 78 in
> >>>>      pgmap v147680: 1216 pgs, 3 pools, 14758 GB data, 3690 kobjects
> >>>>            1753 GB used, 129 TB / 131 TB avail
> >>>>                1088 active+clean
> >>>>                 128 active+clean+inconsistent
> >>>>
> >>>> the 128 inconsistent pgs are ALL the pgs of the EC KV store ( the
> others
> >>>> are on Filestore)
> >>>>
> >>>> The only thing I can see in the logs is that after the rados tests, it
> >>>> start scrubbing, and for each KV pg I get something like this:
> >>>>
> >>>> 2014-08-31 11:14:09.050747 osd.11 10.141.8.180:6833/61098 4 : [ERR]
> >>>> 2.3s0
> >>>> scrub stat mismatch, got 28164/29291 objects, 0/0 clones, 28164/29291
> >>>> dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts,
> >>>> 118128377856/122855358464 bytes.
> >>>>
> >>>> What could here be the problem?
> >>>> Thanks again!!
> >>>>
> >>>> Kenneth
> >>>>
> >>>>
> >>>> ----- Message from Haomai Wang <haomaiwang at gmail.com> ---------
> >>>>   Date: Tue, 26 Aug 2014 17:11:43 +0800
> >>>>   From: Haomai Wang <haomaiwang at gmail.com>
> >>>> Subject: Re: ceph cluster inconsistency?
> >>>>     To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be>
> >>>>     Cc: ceph-users at lists.ceph.com
> >>>>
> >>>>
> >>>> Hmm, it looks like you hit this
> >>>> bug(http://tracker.ceph.com/issues/9223).
> >>>>>
> >>>>>
> >>>>> Sorry for the late message, I forget that this fix is merged into
> 0.84.
> >>>>>
> >>>>> Thanks for your patient :-)
> >>>>>
> >>>>> On Tue, Aug 26, 2014 at 4:39 PM, Kenneth Waegeman
> >>>>> <Kenneth.Waegeman at ugent.be> wrote:
> >>>>>
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> In the meantime I already tried with upgrading the cluster to 0.84,
> to
> >>>>>> see
> >>>>>> if that made a difference, and it seems it does.
> >>>>>> I can't reproduce the crashing osds by doing a 'rados -p ecdata ls'
> >>>>>> anymore.
> >>>>>>
> >>>>>> But now the cluster detect it is inconsistent:
> >>>>>>
> >>>>>>      cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d
> >>>>>>       health HEALTH_ERR 40 pgs inconsistent; 40 scrub errors; too
> few
> >>>>>> pgs
> >>>>>> per osd (4 < min 20); mon.ceph002 low disk space
> >>>>>>       monmap e3: 3 mons at
> >>>>>> {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,
> >>>>>> ceph003=10.141.8.182:6789/0},
> >>>>>> election epoch 30, quorum 0,1,2 ceph001,ceph002,ceph003
> >>>>>>       mdsmap e78951: 1/1/1 up {0=ceph003.cubone.os=up:active}, 3
> >>>>>> up:standby
> >>>>>>       osdmap e145384: 78 osds: 78 up, 78 in
> >>>>>>        pgmap v247095: 320 pgs, 4 pools, 15366 GB data, 3841 kobjects
> >>>>>>              1502 GB used, 129 TB / 131 TB avail
> >>>>>>                   279 active+clean
> >>>>>>                    40 active+clean+inconsistent
> >>>>>>                     1 active+clean+scrubbing+deep
> >>>>>>
> >>>>>>
> >>>>>> I tried to do ceph pg repair for all the inconsistent pgs:
> >>>>>>
> >>>>>>      cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d
> >>>>>>       health HEALTH_ERR 40 pgs inconsistent; 1 pgs repair; 40 scrub
> >>>>>> errors;
> >>>>>> too few pgs per osd (4 < min 20); mon.ceph002 low disk space
> >>>>>>       monmap e3: 3 mons at
> >>>>>> {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,
> >>>>>> ceph003=10.141.8.182:6789/0},
> >>>>>> election epoch 30, quorum 0,1,2 ceph001,ceph002,ceph003
> >>>>>>       mdsmap e79486: 1/1/1 up {0=ceph003.cubone.os=up:active}, 3
> >>>>>> up:standby
> >>>>>>       osdmap e146452: 78 osds: 78 up, 78 in
> >>>>>>        pgmap v248520: 320 pgs, 4 pools, 15366 GB data, 3841 kobjects
> >>>>>>              1503 GB used, 129 TB / 131 TB avail
> >>>>>>                   279 active+clean
> >>>>>>                    39 active+clean+inconsistent
> >>>>>>                     1 active+clean+scrubbing+deep
> >>>>>>                     1
> active+clean+scrubbing+deep+inconsistent+repair
> >>>>>>
> >>>>>> I let it recovering through the night, but this morning the mons
> were
> >>>>>> all
> >>>>>> gone, nothing to see in the log files.. The osds were all still up!
> >>>>>>
> >>>>>>    cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d
> >>>>>>     health HEALTH_ERR 36 pgs inconsistent; 1 pgs repair; 36 scrub
> >>>>>> errors;
> >>>>>> too few pgs per osd (4 < min 20)
> >>>>>>     monmap e7: 3 mons at
> >>>>>> {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,
> >>>>>> ceph003=10.141.8.182:6789/0},
> >>>>>> election epoch 44, quorum 0,1,2 ceph001,ceph002,ceph003
> >>>>>>     mdsmap e109481: 1/1/1 up {0=ceph003.cubone.os=up:active}, 3
> >>>>>> up:standby
> >>>>>>     osdmap e203410: 78 osds: 78 up, 78 in
> >>>>>>      pgmap v331747: 320 pgs, 4 pools, 15251 GB data, 3812 kobjects
> >>>>>>            1547 GB used, 129 TB / 131 TB avail
> >>>>>>                   1 active+clean+scrubbing+deep+inconsistent+repair
> >>>>>>                 284 active+clean
> >>>>>>                  35 active+clean+inconsistent
> >>>>>>
> >>>>>> I restarted the monitors now, I will let you know when I see
> something
> >>>>>> more..
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> ----- Message from Haomai Wang <haomaiwang at gmail.com> ---------
> >>>>>>     Date: Sun, 24 Aug 2014 12:51:41 +0800
> >>>>>>
> >>>>>>     From: Haomai Wang <haomaiwang at gmail.com>
> >>>>>> Subject: Re: ceph cluster inconsistency?
> >>>>>>       To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be>,
> >>>>>> ceph-users at lists.ceph.com
> >>>>>>
> >>>>>>
> >>>>>> It's really strange! I write a test program according the key
> ordering
> >>>>>>>
> >>>>>>> you provided and parse the corresponding value. It's true!
> >>>>>>>
> >>>>>>> I have no idea now. If free, could you add this debug code to
> >>>>>>> "src/os/GenericObjectMap.cc" and insert *before* "assert(start <=
> >>>>>>> header.oid);":
> >>>>>>>
> >>>>>>>  dout(0) << "start: " << start << "header.oid: " << header.oid <<
> >>>>>>> dendl;
> >>>>>>>
> >>>>>>> Then you need to recompile ceph-osd and run it again. The output
> log
> >>>>>>> can help it!
> >>>>>>>
> >>>>>>> On Tue, Aug 19, 2014 at 10:19 PM, Haomai Wang <
> haomaiwang at gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>>
> >>>>>>>> I feel a little embarrassed, 1024 rows still true for me.
> >>>>>>>>
> >>>>>>>> I was wondering if you could give your all keys via
> >>>>>>>> ""ceph-kvstore-tool /var/lib/ceph/osd/ceph-67/current/ list
> >>>>>>>> _GHOBJTOSEQ_ > keys.log?.
> >>>>>>>>
> >>>>>>>> thanks!
> >>>>>>>>
> >>>>>>>> On Tue, Aug 19, 2014 at 4:58 PM, Kenneth Waegeman
> >>>>>>>> <Kenneth.Waegeman at ugent.be> wrote:
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> ----- Message from Haomai Wang <haomaiwang at gmail.com> ---------
> >>>>>>>>> Date: Tue, 19 Aug 2014 12:28:27 +0800
> >>>>>>>>>
> >>>>>>>>> From: Haomai Wang <haomaiwang at gmail.com>
> >>>>>>>>> Subject: Re: ceph cluster inconsistency?
> >>>>>>>>>   To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be>
> >>>>>>>>>   Cc: Sage Weil <sweil at redhat.com>, ceph-users at lists.ceph.com
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Mon, Aug 18, 2014 at 7:32 PM, Kenneth Waegeman
> >>>>>>>>>>
> >>>>>>>>>> <Kenneth.Waegeman at ugent.be> wrote:
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> ----- Message from Haomai Wang <haomaiwang at gmail.com>
> ---------
> >>>>>>>>>>> Date: Mon, 18 Aug 2014 18:34:11 +0800
> >>>>>>>>>>>
> >>>>>>>>>>> From: Haomai Wang <haomaiwang at gmail.com>
> >>>>>>>>>>> Subject: Re: ceph cluster inconsistency?
> >>>>>>>>>>>   To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be>
> >>>>>>>>>>>   Cc: Sage Weil <sweil at redhat.com>, ceph-users at lists.ceph.com
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Aug 18, 2014 at 5:38 PM, Kenneth Waegeman
> >>>>>>>>>>>>
> >>>>>>>>>>>> <Kenneth.Waegeman at ugent.be> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I tried this after restarting the osd, but I guess that was
> not
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>> aim
> >>>>>>>>>>>>> (
> >>>>>>>>>>>>> # ceph-kvstore-tool /var/lib/ceph/osd/ceph-67/current/ list
> >>>>>>>>>>>>> _GHOBJTOSEQ_|
> >>>>>>>>>>>>> grep 6adb1100 -A 100
> >>>>>>>>>>>>> IO error: lock /var/lib/ceph/osd/ceph-67/current//LOCK:
> >>>>>>>>>>>>> Resource
> >>>>>>>>>>>>> temporarily
> >>>>>>>>>>>>> unavailable
> >>>>>>>>>>>>> tools/ceph_kvstore_tool.cc: In function
> >>>>>>>>>>>>> 'StoreTool::StoreTool(const
> >>>>>>>>>>>>> string&)' thread 7f8fecf7d780 time 2014-08-18 11:12:29.551780
> >>>>>>>>>>>>> tools/ceph_kvstore_tool.cc: 38: FAILED
> >>>>>>>>>>>>> assert(!db_ptr->open(std::cerr))
> >>>>>>>>>>>>> ..
> >>>>>>>>>>>>> )
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> When I run it after bringing the osd down, it takes a while,
> >>>>>>>>>>>>> but
> >>>>>>>>>>>>> it
> >>>>>>>>>>>>> has
> >>>>>>>>>>>>> no
> >>>>>>>>>>>>> output.. (When running it without the grep, I'm getting a
> huge
> >>>>>>>>>>>>> list
> >>>>>>>>>>>>> )
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Oh, sorry for it! I made a mistake, the hash value(6adb1100)
> >>>>>>>>>>>> will
> >>>>>>>>>>>> be
> >>>>>>>>>>>> reversed into leveldb.
> >>>>>>>>>>>> So grep "benchmark_data_ceph001.cubone.os_5560_object789734"
> >>>>>>>>>>>> should
> >>>>>>>>>>>> be
> >>>>>>>>>>>> help it.
> >>>>>>>>>>>>
> >>>>>>>>>>>> this gives:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> [root at ceph003 ~]# ceph-kvstore-tool /var/lib/ceph/osd/ceph-67/
> >>>>>>>>>>> current/
> >>>>>>>>>>> list
> >>>>>>>>>>> _GHOBJTOSEQ_ | grep 5560_object789734 -A 100
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011BDA6!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object789734!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011C027!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object1330170!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011C6FD!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_4919_object227366!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011CB03!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1363631!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011CDF0!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1573957!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011D02C!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1019282!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011E2B5!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object1283563!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011E511!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_4919_object273736!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011E547!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1170628!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011EAAB!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_4919_object256335!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011F446!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1484196!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0011FC59!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object884178!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001203F3!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object853746!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001208E3!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object36633!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00120B37!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object1235337!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001210B6!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1661351!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001210CB!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object238126!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012184C!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object339943!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00121916!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1047094!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001219C1!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object520642!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001222BB!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object639565!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001223AA!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_4919_object231080!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012243C!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object858050!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012289C!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object241796!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00122D28!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_4919_object7462!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00122DFE!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object243798!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00122EFC!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_8961_object109512!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001232D7!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object653973!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001234A3!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1378169!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00123714!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object512925!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001237D9!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_4919_object23289!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00123854!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object1108852!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00123971!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object704026!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00123F75!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_8961_object250441!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00124083!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object706178!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001240FA!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object316952!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012447D!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object538734!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001244D9!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object789215!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001247CD!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_8961_object265993!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00124897!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object610597!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00124BE4!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object691723!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00124C9B!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1306135!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00124E1D!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object520580!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012534C!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object659767!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00125A81!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object184060!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00125E77!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1292867!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00126562!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object1201410!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00126B34!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1657326!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00127383!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1269787!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00127396!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object500115!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001277F8!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object394932!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001279DD!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_4919_object252963!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00127B40!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object936811!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00127BAC!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object1481773!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012894E!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object999885!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00128D05!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object943667!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012908A!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object212990!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00129519!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object437596!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00129716!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1585330!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00129798!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object603505!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001299C9!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object808800!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00129B7A!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object23193!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00129B9A!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object1158397!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012A932!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object542450!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012B77A!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_8961_object195480!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012BE8C!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_4919_object312911!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012BF74!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1563783!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012C65C!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1123980!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012C6FE!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_3411_object913!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012CCAD!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object400863!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012CDBB!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object789667!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012D14B!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object1020723!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012D95B!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_8961_object106293!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012E3C8!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1355526!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012E5B3!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1491348!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012F2BB!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_8961_object338872!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012F374!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object1337264!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012FBE5!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1512395!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012FCE3!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_8961_object298610!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0012FEB6!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_4919_object120824!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001301CA!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object816326!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00130263!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object777163!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00130529!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1413173!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001317D9!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object809510!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0013204F!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object471416!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00132400!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object695087!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00132A19!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object591945!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00132BF8!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object302000!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00132F5B!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1645443!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00133B8B!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object761911!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!0013433E!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object1467727!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00134446!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object791960!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00134678!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object677078!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00134A96!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object254923!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!001355D0!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_31461_object321528!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00135690!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_4919_object36935!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00135B62!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object1228272!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00135C72!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_4812_object2180!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00135DEE!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object425705!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00136366!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object141569!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _GHOBJTOSEQ_:3%e0s0_head!00136371!!3!!benchmark_data_
> >>>>>>>>>>> ceph001%ecubone%eos_5560_object564213!head
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>> 100 rows seemed true for me. I found the min list objects is
> 1024.
> >>>>>>>>>> Please could you run
> >>>>>>>>>> "ceph-kvstore-tool /var/lib/ceph/osd/ceph-67/current/ list
> >>>>>>>>>> _GHOBJTOSEQ_| grep 6adb1100 -A 1024"
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I got the output in attachment
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>>> Or should I run this immediately after the osd is crashed,
> >>>>>>>>>>>>> (because
> >>>>>>>>>>>>> it
> >>>>>>>>>>>>> maybe
> >>>>>>>>>>>>> rebalanced?  I did already restarted the cluster)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I don't know if it is related, but before I could all do
> that,
> >>>>>>>>>>>>> I
> >>>>>>>>>>>>> had
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>> fix
> >>>>>>>>>>>>> something else: A monitor did run out if disk space, using
> 8GB
> >>>>>>>>>>>>> for
> >>>>>>>>>>>>> his
> >>>>>>>>>>>>> store.db folder (lot of sst files). Other monitors are also
> >>>>>>>>>>>>> near
> >>>>>>>>>>>>> that
> >>>>>>>>>>>>> level.
> >>>>>>>>>>>>> Never had that problem on previous setups before. I
> recreated a
> >>>>>>>>>>>>> monitor
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>> now it uses 3.8GB.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> It exists some duplicate data which needed to be compacted.
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>> Another idea, maybe you can make KeyValueStore's stripe size
> >>>>>>>>>>>> align
> >>>>>>>>>>>> with EC stripe size.
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> How can I do that? Is there some documentation about that?
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> ceph --show-config | grep keyvaluestore
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> debug_keyvaluestore = 0/0
> >>>>>>>>>> keyvaluestore_queue_max_ops = 50
> >>>>>>>>>> keyvaluestore_queue_max_bytes = 104857600
> >>>>>>>>>> keyvaluestore_debug_check_backend = false
> >>>>>>>>>> keyvaluestore_op_threads = 2
> >>>>>>>>>> keyvaluestore_op_thread_timeout = 60
> >>>>>>>>>> keyvaluestore_op_thread_suicide_timeout = 180
> >>>>>>>>>> keyvaluestore_default_strip_size = 4096
> >>>>>>>>>> keyvaluestore_max_expected_write_size = 16777216
> >>>>>>>>>> keyvaluestore_header_cache_size = 4096
> >>>>>>>>>> keyvaluestore_backend = leveldb
> >>>>>>>>>>
> >>>>>>>>>> keyvaluestore_default_strip_size is the wanted
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> I haven't think deeply and maybe I will try it later.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks!
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Kenneth
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> ----- Message from Sage Weil <sweil at redhat.com> ---------
> >>>>>>>>>>>>> Date: Fri, 15 Aug 2014 06:10:34 -0700 (PDT)
> >>>>>>>>>>>>> From: Sage Weil <sweil at redhat.com>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Subject: Re: ceph cluster inconsistency?
> >>>>>>>>>>>>>   To: Haomai Wang <haomaiwang at gmail.com>
> >>>>>>>>>>>>>   Cc: Kenneth Waegeman <Kenneth.Waegeman at ugent.be>,
> >>>>>>>>>>>>> ceph-users at lists.ceph.com
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Fri, 15 Aug 2014, Haomai Wang wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hi Kenneth,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I don't find valuable info in your logs, it lack of the
> >>>>>>>>>>>>>>> necessary
> >>>>>>>>>>>>>>> debug output when accessing crash code.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> But I scan the encode/decode implementation in
> >>>>>>>>>>>>>>> GenericObjectMap
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>> find something bad.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> For example, two oid has same hash and their name is:
> >>>>>>>>>>>>>>> A: "rb.data.123"
> >>>>>>>>>>>>>>> B: "rb-123"
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> In ghobject_t compare level, A < B. But GenericObjectMap
> >>>>>>>>>>>>>>> encode
> >>>>>>>>>>>>>>> "."
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>> "%e", so the key in DB is:
> >>>>>>>>>>>>>>> A: _GHOBJTOSEQ_:blah!51615000!!none!!rb%edata%e123!head
> >>>>>>>>>>>>>>> B: _GHOBJTOSEQ_:blah!51615000!!none!!rb-123!head
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> A > B
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> And it seemed that the escape function is useless and
> should
> >>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>> disabled.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I'm not sure whether Kenneth's problem is touching this
> bug.
> >>>>>>>>>>>>>>> Because
> >>>>>>>>>>>>>>> this scene only occur when the object set is very large and
> >>>>>>>>>>>>>>> make
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>> two object has same hash value.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Kenneth, could you have time to run "ceph-kv-store
> >>>>>>>>>>>>>>> [path-to-osd]
> >>>>>>>>>>>>>>> list
> >>>>>>>>>>>>>>> _GHOBJTOSEQ_| grep 6adb1100 -A 100". ceph-kv-store is a
> debug
> >>>>>>>>>>>>>>> tool
> >>>>>>>>>>>>>>> which can be compiled from source. You can clone ceph repo
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>> run
> >>>>>>>>>>>>>>> "./authongen.sh; ./configure; cd src; make
> >>>>>>>>>>>>>>> ceph-kvstore-tool".
> >>>>>>>>>>>>>>> "path-to-osd" should be "/var/lib/ceph/osd-[id]/current/".
> >>>>>>>>>>>>>>> "6adb1100"
> >>>>>>>>>>>>>>> is from your verbose log and the next 100 rows should know
> >>>>>>>>>>>>>>> necessary
> >>>>>>>>>>>>>>> infos.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> You can also get ceph-kvstore-tool from the 'ceph-tests'
> >>>>>>>>>>>>>> package.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi sage, do you think we need to provided with upgrade
> >>>>>>>>>>>>>> function
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>> fix
> >>>>>>>>>>>>>>> it?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hmm, we might.  This only affects the key/value encoding
> >>>>>>>>>>>>>> right?
> >>>>>>>>>>>>>> The
> >>>>>>>>>>>>>> FileStore is using its own function to map these to file
> >>>>>>>>>>>>>> names?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Can you open a ticket in the tracker for this?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks!
> >>>>>>>>>>>>>> sage
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Thu, Aug 14, 2014 at 7:36 PM, Kenneth Waegeman
> >>>>>>>>>>>>>>> <Kenneth.Waegeman at ugent.be> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> ----- Message from Haomai Wang <haomaiwang at gmail.com>
> >>>>>>>>>>>>>>>> ---------
> >>>>>>>>>>>>>>>>  Date: Thu, 14 Aug 2014 19:11:55 +0800
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>  From: Haomai Wang <haomaiwang at gmail.com>
> >>>>>>>>>>>>>>>> Subject: Re: ceph cluster inconsistency?
> >>>>>>>>>>>>>>>>    To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Could you add config "debug_keyvaluestore = 20/20" to the
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> crashed
> >>>>>>>>>>>>>>>>> osd
> >>>>>>>>>>>>>>>>> and replay the command causing crash?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I would like to get more debug infos! Thanks.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I included the log in attachment!
> >>>>>>>>>>>>>>>> Thanks!
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Thu, Aug 14, 2014 at 4:41 PM, Kenneth Waegeman
> >>>>>>>>>>>>>>>>> <Kenneth.Waegeman at ugent.be> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I have:
> >>>>>>>>>>>>>>>>>> osd_objectstore = keyvaluestore-dev
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> in the global section of my ceph.conf
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> [root at ceph002 ~]# ceph osd erasure-code-profile get
> >>>>>>>>>>>>>>>>>> profile11
> >>>>>>>>>>>>>>>>>> directory=/usr/lib64/ceph/erasure-code
> >>>>>>>>>>>>>>>>>> k=8
> >>>>>>>>>>>>>>>>>> m=3
> >>>>>>>>>>>>>>>>>> plugin=jerasure
> >>>>>>>>>>>>>>>>>> ruleset-failure-domain=osd
> >>>>>>>>>>>>>>>>>> technique=reed_sol_van
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> the ecdata pool has this as profile
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> pool 3 'ecdata' erasure size 11 min_size 8
> crush_ruleset 2
> >>>>>>>>>>>>>>>>>> object_hash
> >>>>>>>>>>>>>>>>>> rjenkins pg_num 128 pgp_num 128 last_change 161 flags
> >>>>>>>>>>>>>>>>>> hashpspool
> >>>>>>>>>>>>>>>>>> stripe_width 4096
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> ECrule in crushmap
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> rule ecdata {
> >>>>>>>>>>>>>>>>>>       ruleset 2
> >>>>>>>>>>>>>>>>>>       type erasure
> >>>>>>>>>>>>>>>>>>       min_size 3
> >>>>>>>>>>>>>>>>>>       max_size 20
> >>>>>>>>>>>>>>>>>>       step set_chooseleaf_tries 5
> >>>>>>>>>>>>>>>>>>       step take default-ec
> >>>>>>>>>>>>>>>>>>       step choose indep 0 type osd
> >>>>>>>>>>>>>>>>>>       step emit
> >>>>>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>>>> root default-ec {
> >>>>>>>>>>>>>>>>>>       id -8           # do not change unnecessarily
> >>>>>>>>>>>>>>>>>>       # weight 140.616
> >>>>>>>>>>>>>>>>>>       alg straw
> >>>>>>>>>>>>>>>>>>       hash 0  # rjenkins1
> >>>>>>>>>>>>>>>>>>       item ceph001-ec weight 46.872
> >>>>>>>>>>>>>>>>>>       item ceph002-ec weight 46.872
> >>>>>>>>>>>>>>>>>>       item ceph003-ec weight 46.872
> >>>>>>>>>>>>>>>>>> ...
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Cheers!
> >>>>>>>>>>>>>>>>>> Kenneth
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> ----- Message from Haomai Wang <haomaiwang at gmail.com>
> >>>>>>>>>>>>>>>>>> ---------
> >>>>>>>>>>>>>>>>>>  Date: Thu, 14 Aug 2014 10:07:50 +0800
> >>>>>>>>>>>>>>>>>>  From: Haomai Wang <haomaiwang at gmail.com>
> >>>>>>>>>>>>>>>>>> Subject: Re: ceph cluster inconsistency?
> >>>>>>>>>>>>>>>>>>    To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be>
> >>>>>>>>>>>>>>>>>>    Cc: ceph-users <ceph-users at lists.ceph.com>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Hi Kenneth,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Could you give your configuration related to EC and
> >>>>>>>>>>>>>>>>>>> KeyValueStore?
> >>>>>>>>>>>>>>>>>>> Not sure whether it's bug on KeyValueStore
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Thu, Aug 14, 2014 at 12:06 AM, Kenneth Waegeman
> >>>>>>>>>>>>>>>>>>> <Kenneth.Waegeman at ugent.be> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> I was doing some tests with rados bench on a Erasure
> >>>>>>>>>>>>>>>>>>>> Coded
> >>>>>>>>>>>>>>>>>>>> pool
> >>>>>>>>>>>>>>>>>>>> (using
> >>>>>>>>>>>>>>>>>>>> keyvaluestore-dev objectstore) on 0.83, and I see some
> >>>>>>>>>>>>>>>>>>>> strangs
> >>>>>>>>>>>>>>>>>>>> things:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> [root at ceph001 ~]# ceph status
> >>>>>>>>>>>>>>>>>>>>   cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d
> >>>>>>>>>>>>>>>>>>>>    health HEALTH_WARN too few pgs per osd (4 < min 20)
> >>>>>>>>>>>>>>>>>>>>    monmap e1: 3 mons at
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> {ceph001=
> 10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,
> >>>>>>>>>>>>>>>>>>>> ceph003=10.141.8.182:6789/0},
> >>>>>>>>>>>>>>>>>>>> election epoch 6, quorum 0,1,2 ceph001,ceph002,ceph003
> >>>>>>>>>>>>>>>>>>>>    mdsmap e116: 1/1/1 up
> >>>>>>>>>>>>>>>>>>>> {0=ceph001.cubone.os=up:active},
> >>>>>>>>>>>>>>>>>>>> 2
> >>>>>>>>>>>>>>>>>>>> up:standby
> >>>>>>>>>>>>>>>>>>>>    osdmap e292: 78 osds: 78 up, 78 in
> >>>>>>>>>>>>>>>>>>>>     pgmap v48873: 320 pgs, 4 pools, 15366 GB data,
> 3841
> >>>>>>>>>>>>>>>>>>>> kobjects
> >>>>>>>>>>>>>>>>>>>>           1381 GB used, 129 TB / 131 TB avail
> >>>>>>>>>>>>>>>>>>>>                320 active+clean
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> There is around 15T of data, but only 1.3 T usage.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> This is also visible in rados:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> [root at ceph001 ~]# rados df
> >>>>>>>>>>>>>>>>>>>> pool name       category                 KB
> objects
> >>>>>>>>>>>>>>>>>>>> clones
> >>>>>>>>>>>>>>>>>>>> degraded      unfound           rd        rd KB
> >>>>>>>>>>>>>>>>>>>> wr
> >>>>>>>>>>>>>>>>>>>> wr
> >>>>>>>>>>>>>>>>>>>> KB
> >>>>>>>>>>>>>>>>>>>> data            -                          0
> >>>>>>>>>>>>>>>>>>>> 0
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> 0           0            0            0            0
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> ecdata          -                16113451009
> >>>>>>>>>>>>>>>>>>>> 3933959
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> 0           0            1            1      3935632
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> 16116850711
> >>>>>>>>>>>>>>>>>>>> metadata        -                          2
> >>>>>>>>>>>>>>>>>>>> 20
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> 0           0           33           36           21
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> 8
> >>>>>>>>>>>>>>>>>>>> rbd             -                          0
> >>>>>>>>>>>>>>>>>>>> 0
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> 0           0            0            0            0
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> total used      1448266016      3933979
> >>>>>>>>>>>>>>>>>>>> total avail   139400181016
> >>>>>>>>>>>>>>>>>>>> total space   140848447032
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Another (related?) thing: if I do rados -p ecdata ls,
> I
> >>>>>>>>>>>>>>>>>>>> trigger
> >>>>>>>>>>>>>>>>>>>> osd
> >>>>>>>>>>>>>>>>>>>> shutdowns (each time):
> >>>>>>>>>>>>>>>>>>>> I get a list followed by an error:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> ...
> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_8961_object243839
> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object801983
> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_31461_object856489
> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_8961_object202232
> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_4919_object33199
> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object807797
> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_4919_object74729
> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_31461_object1264121
> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object1318513
> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object1202111
> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_31461_object939107
> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_31461_object729682
> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object122915
> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object76521
> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object113261
> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_31461_object575079
> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object671042
> >>>>>>>>>>>>>>>>>>>> benchmark_data_ceph001.cubone.os_5560_object381146
> >>>>>>>>>>>>>>>>>>>> 2014-08-13 17:57:48.736150 7f65047b5700  0 --
> >>>>>>>>>>>>>>>>>>>> 10.141.8.180:0/1023295 >>
> >>>>>>>>>>>>>>>>>>>> 10.141.8.182:6839/4471 pipe(0x7f64fc019b20 sd=5 :0
> s=1
> >>>>>>>>>>>>>>>>>>>> pgs=0
> >>>>>>>>>>>>>>>>>>>> cs=0
> >>>>>>>>>>>>>>>>>>>> l=1
> >>>>>>>>>>>>>>>>>>>> c=0x7f64fc019db0).fault
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> And I can see this in the log files:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>  -25> 2014-08-13 17:52:56.323908 7f8a97fa4700  1 --
> >>>>>>>>>>>>>>>>>>>> 10.143.8.182:6827/64670 <== osd.57
> 10.141.8.182:0/15796
> >>>>>>>>>>>>>>>>>>>> 51
> >>>>>>>>>>>>>>>>>>>> ====
> >>>>>>>>>>>>>>>>>>>> osd_ping(ping e220 stamp 2014-08-13 17:52:56.323092)
> v2
> >>>>>>>>>>>>>>>>>>>> ====
> >>>>>>>>>>>>>>>>>>>> 47+0+0
> >>>>>>>>>>>>>>>>>>>> (3227325175 0 0) 0xf475940 con 0xee89fa0
> >>>>>>>>>>>>>>>>>>>>  -24> 2014-08-13 17:52:56.323938 7f8a97fa4700  1 --
> >>>>>>>>>>>>>>>>>>>> 10.143.8.182:6827/64670 --> 10.141.8.182:0/15796 --
> >>>>>>>>>>>>>>>>>>>> osd_ping(ping_reply
> >>>>>>>>>>>>>>>>>>>> e220
> >>>>>>>>>>>>>>>>>>>> stamp 2014-08-13 17:52:56.323092) v2 -- ?+0 0xf815b00
> >>>>>>>>>>>>>>>>>>>> con
> >>>>>>>>>>>>>>>>>>>> 0xee89fa0
> >>>>>>>>>>>>>>>>>>>>  -23> 2014-08-13 17:52:56.324078 7f8a997a7700  1 --
> >>>>>>>>>>>>>>>>>>>> 10.141.8.182:6840/64670 <== osd.57
> 10.141.8.182:0/15796
> >>>>>>>>>>>>>>>>>>>> 51
> >>>>>>>>>>>>>>>>>>>> ====
> >>>>>>>>>>>>>>>>>>>> osd_ping(ping e220 stamp 2014-08-13 17:52:56.323092)
> v2
> >>>>>>>>>>>>>>>>>>>> ====
> >>>>>>>>>>>>>>>>>>>> 47+0+0
> >>>>>>>>>>>>>>>>>>>> (3227325175 0 0) 0xf132bc0 con 0xee8a680
> >>>>>>>>>>>>>>>>>>>>  -22> 2014-08-13 17:52:56.324111 7f8a997a7700  1 --
> >>>>>>>>>>>>>>>>>>>> 10.141.8.182:6840/64670 --> 10.141.8.182:0/15796 --
> >>>>>>>>>>>>>>>>>>>> osd_ping(ping_reply
> >>>>>>>>>>>>>>>>>>>> e220
> >>>>>>>>>>>>>>>>>>>> stamp 2014-08-13 17:52:56.323092) v2 -- ?+0 0xf811a40
> >>>>>>>>>>>>>>>>>>>> con
> >>>>>>>>>>>>>>>>>>>> 0xee8a680
> >>>>>>>>>>>>>>>>>>>>  -21> 2014-08-13 17:52:56.584461 7f8a997a7700  1 --
> >>>>>>>>>>>>>>>>>>>> 10.141.8.182:6840/64670 <== osd.29
> 10.143.8.181:0/12142
> >>>>>>>>>>>>>>>>>>>> 47
> >>>>>>>>>>>>>>>>>>>> ====
> >>>>>>>>>>>>>>>>>>>> osd_ping(ping e220 stamp 2014-08-13 17:52:56.583010)
> v2
> >>>>>>>>>>>>>>>>>>>> ====
> >>>>>>>>>>>>>>>>>>>> 47+0+0
> >>>>>>>>>>>>>>>>>>>> (3355887204 0 0) 0xf655940 con 0xee88b00
> >>>>>>>>>>>>>>>>>>>>  -20> 2014-08-13 17:52:56.584486 7f8a997a7700  1 --
> >>>>>>>>>>>>>>>>>>>> 10.141.8.182:6840/64670 --> 10.143.8.181:0/12142 --
> >>>>>>>>>>>>>>>>>>>> osd_ping(ping_reply
> >>>>>>>>>>>>>>>>>>>> e220
> >>>>>>>>>>>>>>>>>>>> stamp 2014-08-13 17:52:56.583010) v2 -- ?+0 0xf132bc0
> >>>>>>>>>>>>>>>>>>>> con
> >>>>>>>>>>>>>>>>>>>> 0xee88b00
> >>>>>>>>>>>>>>>>>>>>  -19> 2014-08-13 17:52:56.584498 7f8a97fa4700  1 --
> >>>>>>>>>>>>>>>>>>>> 10.143.8.182:6827/64670 <== osd.29
> 10.143.8.181:0/12142
> >>>>>>>>>>>>>>>>>>>> 47
> >>>>>>>>>>>>>>>>>>>> ====
> >>>>>>>>>>>>>>>>>>>> osd_ping(ping e220 stamp 2014-08-13 17:52:56.583010)
> v2
> >>>>>>>>>>>>>>>>>>>> ====
> >>>>>>>>>>>>>>>>>>>> 47+0+0
> >>>>>>>>>>>>>>>>>>>> (3355887204 0 0) 0xf20e040 con 0xee886e0
> >>>>>>>>>>>>>>>>>>>>  -18> 2014-08-13 17:52:56.584526 7f8a97fa4700  1 --
> >>>>>>>>>>>>>>>>>>>> 10.143.8.182:6827/64670 --> 10.143.8.181:0/12142 --
> >>>>>>>>>>>>>>>>>>>> osd_ping(ping_reply
> >>>>>>>>>>>>>>>>>>>> e220
> >>>>>>>>>>>>>>>>>>>> stamp 2014-08-13 17:52:56.583010) v2 -- ?+0 0xf475940
> >>>>>>>>>>>>>>>>>>>> con
> >>>>>>>>>>>>>>>>>>>> 0xee886e0
> >>>>>>>>>>>>>>>>>>>>  -17> 2014-08-13 17:52:56.594448 7f8a798c7700  1 --
> >>>>>>>>>>>>>>>>>>>> 10.141.8.182:6839/64670 >> :/0 pipe(0xec15f00 sd=74
> >>>>>>>>>>>>>>>>>>>> :6839
> >>>>>>>>>>>>>>>>>>>> s=0
> >>>>>>>>>>>>>>>>>>>> pgs=0
> >>>>>>>>>>>>>>>>>>>> cs=0
> >>>>>>>>>>>>>>>>>>>> l=0
> >>>>>>>>>>>>>>>>>>>> c=0xee856a0).accept sd=74 10.141.8.180:47641/0
> >>>>>>>>>>>>>>>>>>>>  -16> 2014-08-13 17:52:56.594921 7f8a798c7700  1 --
> >>>>>>>>>>>>>>>>>>>> 10.141.8.182:6839/64670 <== client.7512
> >>>>>>>>>>>>>>>>>>>> 10.141.8.180:0/1018433
> >>>>>>>>>>>>>>>>>>>> 1
> >>>>>>>>>>>>>>>>>>>> ====
> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:1  [pgls start_epoch 0] 3.0
> >>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) v4 ==== 151+0+39
> >>>>>>>>>>>>>>>>>>>> (1972163119
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> 4174233976) 0xf3bca40 con 0xee856a0
> >>>>>>>>>>>>>>>>>>>>  -15> 2014-08-13 17:52:56.594957 7f8a798c7700  5 -- op
> >>>>>>>>>>>>>>>>>>>> tracker
> >>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>> ,
> >>>>>>>>>>>>>>>>>>>> seq:
> >>>>>>>>>>>>>>>>>>>> 299, time: 2014-08-13 17:52:56.594874, event:
> >>>>>>>>>>>>>>>>>>>> header_read,
> >>>>>>>>>>>>>>>>>>>> op:
> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:1  [pgls start_epoch 0] 3.0
> >>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220)
> >>>>>>>>>>>>>>>>>>>>  -14> 2014-08-13 17:52:56.594970 7f8a798c7700  5 -- op
> >>>>>>>>>>>>>>>>>>>> tracker
> >>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>> ,
> >>>>>>>>>>>>>>>>>>>> seq:
> >>>>>>>>>>>>>>>>>>>> 299, time: 2014-08-13 17:52:56.594880, event:
> throttled,
> >>>>>>>>>>>>>>>>>>>> op:
> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:1  [pgls start_epoch 0] 3.0
> >>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220)
> >>>>>>>>>>>>>>>>>>>>  -13> 2014-08-13 17:52:56.594978 7f8a798c7700  5 -- op
> >>>>>>>>>>>>>>>>>>>> tracker
> >>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>> ,
> >>>>>>>>>>>>>>>>>>>> seq:
> >>>>>>>>>>>>>>>>>>>> 299, time: 2014-08-13 17:52:56.594917, event:
> all_read,
> >>>>>>>>>>>>>>>>>>>> op:
> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:1  [pgls start_epoch 0] 3.0
> >>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220)
> >>>>>>>>>>>>>>>>>>>>  -12> 2014-08-13 17:52:56.594986 7f8a798c7700  5 -- op
> >>>>>>>>>>>>>>>>>>>> tracker
> >>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>> ,
> >>>>>>>>>>>>>>>>>>>> seq:
> >>>>>>>>>>>>>>>>>>>> 299, time: 0.000000, event: dispatched, op:
> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:1
> >>>>>>>>>>>>>>>>>>>> [pgls
> >>>>>>>>>>>>>>>>>>>> start_epoch 0] 3.0 ack+read+known_if_redirected e220)
> >>>>>>>>>>>>>>>>>>>>  -11> 2014-08-13 17:52:56.595127 7f8a90795700  5 -- op
> >>>>>>>>>>>>>>>>>>>> tracker
> >>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>> ,
> >>>>>>>>>>>>>>>>>>>> seq:
> >>>>>>>>>>>>>>>>>>>> 299, time: 2014-08-13 17:52:56.595104, event:
> >>>>>>>>>>>>>>>>>>>> reached_pg,
> >>>>>>>>>>>>>>>>>>>> op:
> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:1  [pgls start_epoch 0] 3.0
> >>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220)
> >>>>>>>>>>>>>>>>>>>>  -10> 2014-08-13 17:52:56.595159 7f8a90795700  5 -- op
> >>>>>>>>>>>>>>>>>>>> tracker
> >>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>> ,
> >>>>>>>>>>>>>>>>>>>> seq:
> >>>>>>>>>>>>>>>>>>>> 299, time: 2014-08-13 17:52:56.595153, event: started,
> >>>>>>>>>>>>>>>>>>>> op:
> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:1  [pgls start_epoch 0] 3.0
> >>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220)
> >>>>>>>>>>>>>>>>>>>>   -9> 2014-08-13 17:52:56.602179 7f8a90795700  1 --
> >>>>>>>>>>>>>>>>>>>> 10.141.8.182:6839/64670 --> 10.141.8.180:0/1018433 --
> >>>>>>>>>>>>>>>>>>>> osd_op_reply(1
> >>>>>>>>>>>>>>>>>>>> [pgls
> >>>>>>>>>>>>>>>>>>>> start_epoch 0] v164'30654 uv30654 ondisk = 0) v6 --
> ?+0
> >>>>>>>>>>>>>>>>>>>> 0xec16180
> >>>>>>>>>>>>>>>>>>>> con
> >>>>>>>>>>>>>>>>>>>> 0xee856a0
> >>>>>>>>>>>>>>>>>>>>   -8> 2014-08-13 17:52:56.602211 7f8a90795700  5 -- op
> >>>>>>>>>>>>>>>>>>>> tracker
> >>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>> ,
> >>>>>>>>>>>>>>>>>>>> seq:
> >>>>>>>>>>>>>>>>>>>> 299, time: 2014-08-13 17:52:56.602205, event: done,
> op:
> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:1  [pgls start_epoch 0] 3.0
> >>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220)
> >>>>>>>>>>>>>>>>>>>>   -7> 2014-08-13 17:52:56.614839 7f8a798c7700  1 --
> >>>>>>>>>>>>>>>>>>>> 10.141.8.182:6839/64670 <== client.7512
> >>>>>>>>>>>>>>>>>>>> 10.141.8.180:0/1018433
> >>>>>>>>>>>>>>>>>>>> 2
> >>>>>>>>>>>>>>>>>>>> ====
> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:2  [pgls start_epoch 220] 3.0
> >>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220) v4 ==== 151+0+89
> >>>>>>>>>>>>>>>>>>>> (3460833343
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> 2600845095) 0xf3bcec0 con 0xee856a0
> >>>>>>>>>>>>>>>>>>>>   -6> 2014-08-13 17:52:56.614864 7f8a798c7700  5 -- op
> >>>>>>>>>>>>>>>>>>>> tracker
> >>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>> ,
> >>>>>>>>>>>>>>>>>>>> seq:
> >>>>>>>>>>>>>>>>>>>> 300, time: 2014-08-13 17:52:56.614789, event:
> >>>>>>>>>>>>>>>>>>>> header_read,
> >>>>>>>>>>>>>>>>>>>> op:
> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:2  [pgls start_epoch 220] 3.0
> >>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220)
> >>>>>>>>>>>>>>>>>>>>   -5> 2014-08-13 17:52:56.614874 7f8a798c7700  5 -- op
> >>>>>>>>>>>>>>>>>>>> tracker
> >>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>> ,
> >>>>>>>>>>>>>>>>>>>> seq:
> >>>>>>>>>>>>>>>>>>>> 300, time: 2014-08-13 17:52:56.614792, event:
> throttled,
> >>>>>>>>>>>>>>>>>>>> op:
> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:2  [pgls start_epoch 220] 3.0
> >>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220)
> >>>>>>>>>>>>>>>>>>>>   -4> 2014-08-13 17:52:56.614884 7f8a798c7700  5 -- op
> >>>>>>>>>>>>>>>>>>>> tracker
> >>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>> ,
> >>>>>>>>>>>>>>>>>>>> seq:
> >>>>>>>>>>>>>>>>>>>> 300, time: 2014-08-13 17:52:56.614835, event:
> all_read,
> >>>>>>>>>>>>>>>>>>>> op:
> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:2  [pgls start_epoch 220] 3.0
> >>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220)
> >>>>>>>>>>>>>>>>>>>>   -3> 2014-08-13 17:52:56.614891 7f8a798c7700  5 -- op
> >>>>>>>>>>>>>>>>>>>> tracker
> >>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>> ,
> >>>>>>>>>>>>>>>>>>>> seq:
> >>>>>>>>>>>>>>>>>>>> 300, time: 0.000000, event: dispatched, op:
> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:2
> >>>>>>>>>>>>>>>>>>>> [pgls
> >>>>>>>>>>>>>>>>>>>> start_epoch 220] 3.0 ack+read+known_if_redirected
> e220)
> >>>>>>>>>>>>>>>>>>>>   -2> 2014-08-13 17:52:56.614972 7f8a92f9a700  5 -- op
> >>>>>>>>>>>>>>>>>>>> tracker
> >>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>> ,
> >>>>>>>>>>>>>>>>>>>> seq:
> >>>>>>>>>>>>>>>>>>>> 300, time: 2014-08-13 17:52:56.614958, event:
> >>>>>>>>>>>>>>>>>>>> reached_pg,
> >>>>>>>>>>>>>>>>>>>> op:
> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:2  [pgls start_epoch 220] 3.0
> >>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220)
> >>>>>>>>>>>>>>>>>>>>   -1> 2014-08-13 17:52:56.614993 7f8a92f9a700  5 -- op
> >>>>>>>>>>>>>>>>>>>> tracker
> >>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>> ,
> >>>>>>>>>>>>>>>>>>>> seq:
> >>>>>>>>>>>>>>>>>>>> 300, time: 2014-08-13 17:52:56.614986, event: started,
> >>>>>>>>>>>>>>>>>>>> op:
> >>>>>>>>>>>>>>>>>>>> osd_op(client.7512.0:2  [pgls start_epoch 220] 3.0
> >>>>>>>>>>>>>>>>>>>> ack+read+known_if_redirected e220)
> >>>>>>>>>>>>>>>>>>>>    0> 2014-08-13 17:52:56.617087 7f8a92f9a700 -1
> >>>>>>>>>>>>>>>>>>>> os/GenericObjectMap.cc:
> >>>>>>>>>>>>>>>>>>>> In function 'int GenericObjectMap::list_objects(const
> >>>>>>>>>>>>>>>>>>>> coll_t&,
> >>>>>>>>>>>>>>>>>>>> ghobject_t,
> >>>>>>>>>>>>>>>>>>>> int, std::vector<ghobject_t>*, ghobject_t*)' thread
> >>>>>>>>>>>>>>>>>>>> 7f8a92f9a700
> >>>>>>>>>>>>>>>>>>>> time
> >>>>>>>>>>>>>>>>>>>> 2014-08-13 17:52:56.615073
> >>>>>>>>>>>>>>>>>>>> os/GenericObjectMap.cc: 1118: FAILED assert(start <=
> >>>>>>>>>>>>>>>>>>>> header.oid)
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> ceph version 0.83 (78ff1f0a5dfd3c5850805b40217385
> >>>>>>>>>>>>>>>>>>>> 64c36c92b8)
> >>>>>>>>>>>>>>>>>>>> 1: (GenericObjectMap::list_objects(coll_t const&,
> >>>>>>>>>>>>>>>>>>>> ghobject_t,
> >>>>>>>>>>>>>>>>>>>> int,
> >>>>>>>>>>>>>>>>>>>> std::vector<ghobject_t, std::allocator<ghobject_t> >*,
> >>>>>>>>>>>>>>>>>>>> ghobject_t*)+0x474)
> >>>>>>>>>>>>>>>>>>>> [0x98f774]
> >>>>>>>>>>>>>>>>>>>> 2: (KeyValueStore::collection_list_partial(coll_t,
> >>>>>>>>>>>>>>>>>>>> ghobject_t,
> >>>>>>>>>>>>>>>>>>>> int,
> >>>>>>>>>>>>>>>>>>>> int,
> >>>>>>>>>>>>>>>>>>>> snapid_t, std::vector<ghobject_t,
> >>>>>>>>>>>>>>>>>>>> std::allocator<ghobject_t>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> *,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> ghobject_t*)+0x274) [0x8c5b54]
> >>>>>>>>>>>>>>>>>>>> 3: (PGBackend::objects_list_partial(hobject_t const&,
> >>>>>>>>>>>>>>>>>>>> int,
> >>>>>>>>>>>>>>>>>>>> int,
> >>>>>>>>>>>>>>>>>>>> snapid_t,
> >>>>>>>>>>>>>>>>>>>> std::vector<hobject_t, std::allocator<hobject_t> >*,
> >>>>>>>>>>>>>>>>>>>> hobject_t*)+0x1c9)
> >>>>>>>>>>>>>>>>>>>> [0x862de9]
> >>>>>>>>>>>>>>>>>>>> 4:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> (ReplicatedPG::do_pg_op(std::tr1::shared_ptr<OpRequest>)+
> >>>>>>>>>>>>>>>>>>>> 0xea5)
> >>>>>>>>>>>>>>>>>>>> [0x7f67f5]
> >>>>>>>>>>>>>>>>>>>> 5:
> >>>>>>>>>>>>>>>>>>>> (ReplicatedPG::do_op(std::tr1:
> >>>>>>>>>>>>>>>>>>>> :shared_ptr<OpRequest>)+0x1f3)
> >>>>>>>>>>>>>>>>>>>> [0x8177b3]
> >>>>>>>>>>>>>>>>>>>> 6: (ReplicatedPG::do_request(std:
> >>>>>>>>>>>>>>>>>>>> :tr1::shared_ptr<OpRequest>,
> >>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x5d5) [0x7b8045]
> >>>>>>>>>>>>>>>>>>>> 7: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>>>>>>>>>>>> std::tr1::shared_ptr<OpRequest>,
> >>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x47d)
> >>>>>>>>>>>>>>>>>>>> [0x62bf8d]
> >>>>>>>>>>>>>>>>>>>> 8: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x35c) [0x62c56c]
> >>>>>>>>>>>>>>>>>>>> 9:
> (ShardedThreadPool::shardedthreadpool_worker(unsigned
> >>>>>>>>>>>>>>>>>>>> int)+0x8cd)
> >>>>>>>>>>>>>>>>>>>> [0xa776fd]
> >>>>>>>>>>>>>>>>>>>> 10:
> (ShardedThreadPool::WorkThreadSharded::entry()+0x10)
> >>>>>>>>>>>>>>>>>>>> [0xa79980]
> >>>>>>>>>>>>>>>>>>>> 11: (()+0x7df3) [0x7f8aac71fdf3]
> >>>>>>>>>>>>>>>>>>>> 12: (clone()+0x6d) [0x7f8aab1963dd]
> >>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS
> >>>>>>>>>>>>>>>>>>>> <executable>`
> >>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>> needed
> >>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>> interpret this.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> ceph version 0.83 (78ff1f0a5dfd3c5850805b40217385
> >>>>>>>>>>>>>>>>>>>> 64c36c92b8)
> >>>>>>>>>>>>>>>>>>>> 1: /usr/bin/ceph-osd() [0x99b466]
> >>>>>>>>>>>>>>>>>>>> 2: (()+0xf130) [0x7f8aac727130]
> >>>>>>>>>>>>>>>>>>>> 3: (gsignal()+0x39) [0x7f8aab0d5989]
> >>>>>>>>>>>>>>>>>>>> 4: (abort()+0x148) [0x7f8aab0d7098]
> >>>>>>>>>>>>>>>>>>>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x165)
> >>>>>>>>>>>>>>>>>>>> [0x7f8aab9e89d5]
> >>>>>>>>>>>>>>>>>>>> 6: (()+0x5e946) [0x7f8aab9e6946]
> >>>>>>>>>>>>>>>>>>>> 7: (()+0x5e973) [0x7f8aab9e6973]
> >>>>>>>>>>>>>>>>>>>> 8: (()+0x5eb9f) [0x7f8aab9e6b9f]
> >>>>>>>>>>>>>>>>>>>> 9: (ceph::__ceph_assert_fail(char const*, char const*,
> >>>>>>>>>>>>>>>>>>>> int,
> >>>>>>>>>>>>>>>>>>>> char
> >>>>>>>>>>>>>>>>>>>> const*)+0x1ef) [0xa8805f]
> >>>>>>>>>>>>>>>>>>>> 10: (GenericObjectMap::list_objects(coll_t const&,
> >>>>>>>>>>>>>>>>>>>> ghobject_t,
> >>>>>>>>>>>>>>>>>>>> int,
> >>>>>>>>>>>>>>>>>>>> std::vector<ghobject_t, std::allocator<ghobject_t> >*,
> >>>>>>>>>>>>>>>>>>>> ghobject_t*)+0x474)
> >>>>>>>>>>>>>>>>>>>> [0x98f774]
> >>>>>>>>>>>>>>>>>>>> 11: (KeyValueStore::collection_list_partial(coll_t,
> >>>>>>>>>>>>>>>>>>>> ghobject_t,
> >>>>>>>>>>>>>>>>>>>> int,
> >>>>>>>>>>>>>>>>>>>> int,
> >>>>>>>>>>>>>>>>>>>> snapid_t, std::vector<ghobject_t,
> >>>>>>>>>>>>>>>>>>>> std::allocator<ghobject_t>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> *,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> ghobject_t*)+0x274) [0x8c5b54]
> >>>>>>>>>>>>>>>>>>>> 12: (PGBackend::objects_list_partial(hobject_t const&,
> >>>>>>>>>>>>>>>>>>>> int,
> >>>>>>>>>>>>>>>>>>>> int,
> >>>>>>>>>>>>>>>>>>>> snapid_t,
> >>>>>>>>>>>>>>>>>>>> std::vector<hobject_t, std::allocator<hobject_t> >*,
> >>>>>>>>>>>>>>>>>>>> hobject_t*)+0x1c9)
> >>>>>>>>>>>>>>>>>>>> [0x862de9]
> >>>>>>>>>>>>>>>>>>>> 13:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> (ReplicatedPG::do_pg_op(std::tr1::shared_ptr<OpRequest>)+
> >>>>>>>>>>>>>>>>>>>> 0xea5)
> >>>>>>>>>>>>>>>>>>>> [0x7f67f5]
> >>>>>>>>>>>>>>>>>>>> 14:
> >>>>>>>>>>>>>>>>>>>> (ReplicatedPG::do_op(std::tr1:
> >>>>>>>>>>>>>>>>>>>> :shared_ptr<OpRequest>)+0x1f3)
> >>>>>>>>>>>>>>>>>>>> [0x8177b3]
> >>>>>>>>>>>>>>>>>>>> 15:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>,
> >>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x5d5) [0x7b8045]
> >>>>>>>>>>>>>>>>>>>> 16: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>>>>>>>>>>>> std::tr1::shared_ptr<OpRequest>,
> >>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x47d)
> >>>>>>>>>>>>>>>>>>>> [0x62bf8d]
> >>>>>>>>>>>>>>>>>>>> 17: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x35c) [0x62c56c]
> >>>>>>>>>>>>>>>>>>>> 18:
> >>>>>>>>>>>>>>>>>>>> (ShardedThreadPool::shardedthreadpool_worker(unsigned
> >>>>>>>>>>>>>>>>>>>> int)+0x8cd)
> >>>>>>>>>>>>>>>>>>>> [0xa776fd]
> >>>>>>>>>>>>>>>>>>>> 19:
> (ShardedThreadPool::WorkThreadSharded::entry()+0x10)
> >>>>>>>>>>>>>>>>>>>> [0xa79980]
> >>>>>>>>>>>>>>>>>>>> 20: (()+0x7df3) [0x7f8aac71fdf3]
> >>>>>>>>>>>>>>>>>>>> 21: (clone()+0x6d) [0x7f8aab1963dd]
> >>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS
> >>>>>>>>>>>>>>>>>>>> <executable>`
> >>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>> needed
> >>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>> interpret this.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> --- begin dump of recent events ---
> >>>>>>>>>>>>>>>>>>>>    0> 2014-08-13 17:52:56.714214 7f8a92f9a700 -1 ***
> >>>>>>>>>>>>>>>>>>>> Caught
> >>>>>>>>>>>>>>>>>>>> signal
> >>>>>>>>>>>>>>>>>>>> (Aborted) **
> >>>>>>>>>>>>>>>>>>>> in thread 7f8a92f9a700
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> ceph version 0.83 (78ff1f0a5dfd3c5850805b40217385
> >>>>>>>>>>>>>>>>>>>> 64c36c92b8)
> >>>>>>>>>>>>>>>>>>>> 1: /usr/bin/ceph-osd() [0x99b466]
> >>>>>>>>>>>>>>>>>>>> 2: (()+0xf130) [0x7f8aac727130]
> >>>>>>>>>>>>>>>>>>>> 3: (gsignal()+0x39) [0x7f8aab0d5989]
> >>>>>>>>>>>>>>>>>>>> 4: (abort()+0x148) [0x7f8aab0d7098]
> >>>>>>>>>>>>>>>>>>>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x165)
> >>>>>>>>>>>>>>>>>>>> [0x7f8aab9e89d5]
> >>>>>>>>>>>>>>>>>>>> 6: (()+0x5e946) [0x7f8aab9e6946]
> >>>>>>>>>>>>>>>>>>>> 7: (()+0x5e973) [0x7f8aab9e6973]
> >>>>>>>>>>>>>>>>>>>> 8: (()+0x5eb9f) [0x7f8aab9e6b9f]
> >>>>>>>>>>>>>>>>>>>> 9: (ceph::__ceph_assert_fail(char const*, char const*,
> >>>>>>>>>>>>>>>>>>>> int,
> >>>>>>>>>>>>>>>>>>>> char
> >>>>>>>>>>>>>>>>>>>> const*)+0x1ef) [0xa8805f]
> >>>>>>>>>>>>>>>>>>>> 10: (GenericObjectMap::list_objects(coll_t const&,
> >>>>>>>>>>>>>>>>>>>> ghobject_t,
> >>>>>>>>>>>>>>>>>>>> int,
> >>>>>>>>>>>>>>>>>>>> std::vector<ghobject_t, std::allocator<ghobject_t> >*,
> >>>>>>>>>>>>>>>>>>>> ghobject_t*)+0x474)
> >>>>>>>>>>>>>>>>>>>> [0x98f774]
> >>>>>>>>>>>>>>>>>>>> 11: (KeyValueStore::collection_list_partial(coll_t,
> >>>>>>>>>>>>>>>>>>>> ghobject_t,
> >>>>>>>>>>>>>>>>>>>> int,
> >>>>>>>>>>>>>>>>>>>> int,
> >>>>>>>>>>>>>>>>>>>> snapid_t, std::vector<ghobject_t,
> >>>>>>>>>>>>>>>>>>>> std::allocator<ghobject_t>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> *,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> ghobject_t*)+0x274) [0x8c5b54]
> >>>>>>>>>>>>>>>>>>>> 12: (PGBackend::objects_list_partial(hobject_t const&,
> >>>>>>>>>>>>>>>>>>>> int,
> >>>>>>>>>>>>>>>>>>>> int,
> >>>>>>>>>>>>>>>>>>>> snapid_t,
> >>>>>>>>>>>>>>>>>>>> std::vector<hobject_t, std::allocator<hobject_t> >*,
> >>>>>>>>>>>>>>>>>>>> hobject_t*)+0x1c9)
> >>>>>>>>>>>>>>>>>>>> [0x862de9]
> >>>>>>>>>>>>>>>>>>>> 13:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> (ReplicatedPG::do_pg_op(std::tr1::shared_ptr<OpRequest>)+
> >>>>>>>>>>>>>>>>>>>> 0xea5)
> >>>>>>>>>>>>>>>>>>>> [0x7f67f5]
> >>>>>>>>>>>>>>>>>>>> 14:
> >>>>>>>>>>>>>>>>>>>> (ReplicatedPG::do_op(std::tr1:
> >>>>>>>>>>>>>>>>>>>> :shared_ptr<OpRequest>)+0x1f3)
> >>>>>>>>>>>>>>>>>>>> [0x8177b3]
> >>>>>>>>>>>>>>>>>>>> 15:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>,
> >>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x5d5) [0x7b8045]
> >>>>>>>>>>>>>>>>>>>> 16: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>>>>>>>>>>>> std::tr1::shared_ptr<OpRequest>,
> >>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x47d)
> >>>>>>>>>>>>>>>>>>>> [0x62bf8d]
> >>>>>>>>>>>>>>>>>>>> 17: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x35c) [0x62c56c]
> >>>>>>>>>>>>>>>>>>>> 18:
> >>>>>>>>>>>>>>>>>>>> (ShardedThreadPool::shardedthreadpool_worker(unsigned
> >>>>>>>>>>>>>>>>>>>> int)+0x8cd)
> >>>>>>>>>>>>>>>>>>>> [0xa776fd]
> >>>>>>>>>>>>>>>>>>>> 19:
> (ShardedThreadPool::WorkThreadSharded::entry()+0x10)
> >>>>>>>>>>>>>>>>>>>> [0xa79980]
> >>>>>>>>>>>>>>>>>>>> 20: (()+0x7df3) [0x7f8aac71fdf3]
> >>>>>>>>>>>>>>>>>>>> 21: (clone()+0x6d) [0x7f8aab1963dd]
> >>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS
> >>>>>>>>>>>>>>>>>>>> <executable>`
> >>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>> needed
> >>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>> interpret this.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> I guess this has something to do with using the dev
> >>>>>>>>>>>>>>>>>>>> Keyvaluestore?
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Thanks!
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Kenneth
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>>>>>>>>>> ceph-users mailing list
> >>>>>>>>>>>>>>>>>>>> ceph-users at lists.ceph.com
> >>>>>>>>>>>>>>>>>>>>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>> Best Regards,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Wheat
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> ----- End message from Haomai Wang <
> haomaiwang at gmail.com>
> >>>>>>>>>>>>>>>>>> -----
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Met vriendelijke groeten,
> >>>>>>>>>>>>>>>>>> Kenneth Waegeman
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>> Best Regards,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Wheat
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> ----- End message from Haomai Wang <haomaiwang at gmail.com>
> >>>>>>>>>>>>>>>> -----
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Met vriendelijke groeten,
> >>>>>>>>>>>>>>>> Kenneth Waegeman
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>> Best Regards,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Wheat
> >>>>>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>>>>> ceph-users mailing list
> >>>>>>>>>>>>>>> ceph-users at lists.ceph.com
> >>>>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> ----- End message from Sage Weil <sweil at redhat.com> -----
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Met vriendelijke groeten,
> >
> >
> >>>>>>>>>>>>> Kenneth Waegeman
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> Best Regards,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Wheat
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> ----- End message from Haomai Wang <haomaiwang at gmail.com>
> -----
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>>
> >>>>>>>>>>> Met vriendelijke groeten,
> >>>>>>>>>>> Kenneth Waegeman
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> Best Regards,
> >>>>>>>>>>
> >>>>>>>>>> Wheat
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> ----- End message from Haomai Wang <haomaiwang at gmail.com> -----
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>>
> >>>>>>>>> Met vriendelijke groeten,
> >>>>>>>>> Kenneth Waegeman
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Best Regards,
> >>>>>>>>
> >>>>>>>> Wheat
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Best Regards,
> >>>>>>>
> >>>>>>> Wheat
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> ----- End message from Haomai Wang <haomaiwang at gmail.com> -----
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> Met vriendelijke groeten,
> >>>>>> Kenneth Waegeman
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Best Regards,
> >>>>>
> >>>>> Wheat
> >>>>>
> >>>>
> >>>>
> >>>> ----- End message from Haomai Wang <haomaiwang at gmail.com> -----
> >>>>
> >>>> --
> >>>>
> >>>> Met vriendelijke groeten,
> >>>> Kenneth Waegeman
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>>
> >>> Best Regards,
> >>>
> >>> Wheat
> >>
> >>
> >>
> >> ----- End message from Haomai Wang <haomaiwang at gmail.com> -----
> >>
> >> --
> >>
> >> Met vriendelijke groeten,
> >> Kenneth Waegeman
> >
> >
> >
> > ----- End message from Kenneth Waegeman <Kenneth.Waegeman at UGent.be>
> -----
> >
> >
> > --
> >
> > Met vriendelijke groeten,
> > Kenneth Waegeman
> >
>
>
>
> --
> Best Regards,
>
> Wheat
>

-- 

Best Regards,

Wheat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140907/635df5a7/attachment-0001.htm>