Re: how to fix active+remapped pg

Gregory Farnum <greg@xxxxxxxxxxx> · Tue, 3 Dec 2013 09:33:14 -0800



CRUSH is failing to map all the PGs to the right number of OSDs.
You've got a completely empty host which has ~1/3 of the cluster's
total weight, and that is probably why — remove it!
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Tue, Dec 3, 2013 at 3:13 AM, Ugis <ugis22@xxxxxxxxx> wrote:
> Hi,
> Upgaded to emperor, restarted all nodes.
>
> Still have "31 actige+remapped" pgs.
>
> Compared remapped and healthy pg query output - some remapped pgs do
> not have data, some do, some have been scrubbed some don't. Now
> running read for whole rbd - may be that would trigger those stuck
> pgs.
>
> state on remapped pgs like:
> { "state": "active+remapped",
>   "epoch": 9420,
>   "up": [
>         9],
>   "acting": [
>         9,
>         5],
>
> Any help/hints how to trigger those stuck pgs to up state on 2 osds?
>
> Ugis
>
>
> 2013/11/22 Ugis <ugis22@xxxxxxxxx>:
>> Update: I noticed that I hadn't increased pgp_num for default data
>> pool for which I increased pg_num time ago. So I did now and some
>> backfilling happened.
>> Now I still have "31 actige+remapped" pgs.
>> Remapped pgs belong to all pools, even those where is no data.
>> To me suspicious is that host ceph8 has weight 10.88(I had some osds
>> there temporarily, but due to low ram I remover those)
>> If that is of importance ceph7 is also low on ram(4GB) and is slower
>> to respond at times than ceph5(Sage mentioned "lagging pg peering
>> workqueue" in Bug#3747).
>>
>> Results follow:
>> # ceph osd tree
>> # id    weight  type name       up/down reweight
>> -5      0       root slow
>> -4      0               host ceph5-slow
>> -1      32.46   root default
>> -2      10.5            host ceph5
>> 0       0.2                     osd.0   up      0
>> 2       2.8                     osd.2   up      1
>> 3       2.8                     osd.3   up      1
>> 4       1.9                     osd.4   up      1
>> 5       2.8                     osd.5   up      1
>> -3      0.2             host ceph6
>> 1       0.2                     osd.1   up      0
>> -6      10.88           host ceph7
>> 6       2.73                    osd.6   up      1
>> 7       2.73                    osd.7   up      1
>> 8       2.71                    osd.8   up      1
>> 9       2.71                    osd.9   up      1
>> -7      10.88           host ceph8
>>
>> # ceph osd crush dump
>> { "devices": [
>>         { "id": 0,
>>           "name": "osd.0"},
>>         { "id": 1,
>>           "name": "osd.1"},
>>         { "id": 2,
>>           "name": "osd.2"},
>>         { "id": 3,
>>           "name": "osd.3"},
>>         { "id": 4,
>>           "name": "osd.4"},
>>         { "id": 5,
>>           "name": "osd.5"},
>>         { "id": 6,
>>           "name": "osd.6"},
>>         { "id": 7,
>>           "name": "osd.7"},
>>         { "id": 8,
>>           "name": "osd.8"},
>>         { "id": 9,
>>           "name": "osd.9"}],
>>   "types": [
>>         { "type_id": 0,
>>           "name": "osd"},
>>         { "type_id": 1,
>>           "name": "host"},
>>         { "type_id": 2,
>>           "name": "rack"},
>>         { "type_id": 3,
>>           "name": "row"},
>>         { "type_id": 4,
>>           "name": "room"},
>>         { "type_id": 5,
>>           "name": "datacenter"},
>>         { "type_id": 6,
>>           "name": "root"}],
>>   "buckets": [
>>         { "id": -1,
>>           "name": "default",
>>           "type_id": 6,
>>           "type_name": "root",
>>           "weight": 2127297,
>>           "alg": "straw",
>>           "hash": "rjenkins1",
>>           "items": [
>>                 { "id": -2,
>>                   "weight": 688128,
>>                   "pos": 0},
>>                 { "id": -3,
>>                   "weight": 13107,
>>                   "pos": 1},
>>                 { "id": -6,
>>                   "weight": 713031,
>>                   "pos": 2},
>>                 { "id": -7,
>>                   "weight": 713031,
>>                   "pos": 3}]},
>>         { "id": -2,
>>           "name": "ceph5",
>>           "type_id": 1,
>>           "type_name": "host",
>>           "weight": 688125,
>>           "alg": "straw",
>>           "hash": "rjenkins1",
>>           "items": [
>>                 { "id": 0,
>>                   "weight": 13107,
>>                   "pos": 0},
>>                 { "id": 2,
>>                   "weight": 183500,
>>                   "pos": 1},
>>                 { "id": 3,
>>                   "weight": 183500,
>>                   "pos": 2},
>>                 { "id": 4,
>>                   "weight": 124518,
>>                   "pos": 3},
>>                 { "id": 5,
>>                   "weight": 183500,
>>                   "pos": 4}]},
>>         { "id": -3,
>>           "name": "ceph6",
>>           "type_id": 1,
>>           "type_name": "host",
>>           "weight": 13107,
>>           "alg": "straw",
>>           "hash": "rjenkins1",
>>           "items": [
>>                 { "id": 1,
>>                   "weight": 13107,
>>                   "pos": 0}]},
>>         { "id": -4,
>>           "name": "ceph5-slow",
>>           "type_id": 1,
>>           "type_name": "host",
>>           "weight": 0,
>>           "alg": "straw",
>>           "hash": "rjenkins1",
>>           "items": []},
>>         { "id": -5,
>>           "name": "slow",
>>           "type_id": 6,
>>           "type_name": "root",
>>           "weight": 0,
>>           "alg": "straw",
>>           "hash": "rjenkins1",
>>           "items": [
>>                 { "id": -4,
>>                   "weight": 0,
>>                   "pos": 0}]},
>>         { "id": -6,
>>           "name": "ceph7",
>>           "type_id": 1,
>>           "type_name": "host",
>>           "weight": 713030,
>>           "alg": "straw",
>>           "hash": "rjenkins1",
>>           "items": [
>>                 { "id": 6,
>>                   "weight": 178913,
>>                   "pos": 0},
>>                 { "id": 7,
>>                   "weight": 178913,
>>                   "pos": 1},
>>                 { "id": 8,
>>                   "weight": 177602,
>>                   "pos": 2},
>>                 { "id": 9,
>>                   "weight": 177602,
>>                   "pos": 3}]},
>>         { "id": -7,
>>           "name": "ceph8",
>>           "type_id": 1,
>>           "type_name": "host",
>>           "weight": 0,
>>           "alg": "straw",
>>           "hash": "rjenkins1",
>>           "items": []}],
>>   "rules": [
>>         { "rule_id": 0,
>>           "rule_name": "data",
>>           "ruleset": 0,
>>           "type": 1,
>>           "min_size": 1,
>>           "max_size": 10,
>>           "steps": [
>>                 { "op": "take",
>>                   "item": -1},
>>                 { "op": "chooseleaf_firstn",
>>                   "num": 0,
>>                   "type": "host"},
>>                 { "op": "emit"}]},
>>         { "rule_id": 1,
>>           "rule_name": "metadata",
>>           "ruleset": 1,
>>           "type": 1,
>>           "min_size": 1,
>>           "max_size": 10,
>>           "steps": [
>>                 { "op": "take",
>>                   "item": -1},
>>                 { "op": "chooseleaf_firstn",
>>                   "num": 0,
>>                   "type": "host"},
>>                 { "op": "emit"}]},
>>         { "rule_id": 2,
>>           "rule_name": "rbd",
>>           "ruleset": 2,
>>           "type": 1,
>>           "min_size": 1,
>>           "max_size": 10,
>>           "steps": [
>>                 { "op": "take",
>>                   "item": -1},
>>                 { "op": "chooseleaf_firstn",
>>                   "num": 0,
>>                   "type": "host"},
>>                 { "op": "emit"}]},
>>         { "rule_id": 3,
>>           "rule_name": "own1",
>>           "ruleset": 3,
>>           "type": 1,
>>           "min_size": 1,
>>           "max_size": 20,
>>           "steps": [
>>                 { "op": "take",
>>                   "item": -1},
>>                 { "op": "chooseleaf_firstn",
>>                   "num": 0,
>>                   "type": "host"},
>>                 { "op": "emit"}]}],
>>   "tunables": { "choose_local_tries": 0,
>>       "choose_local_fallback_tries": 0,
>>       "choose_total_tries": 50,
>>       "chooseleaf_descend_once": 1}}
>>
>> Ugis
>>
>> 2013/11/21 John Wilkins <john.wilkins@xxxxxxxxxxx>:
>>> Ugis,
>>>
>>> Can you provide the results for:
>>>
>>> ceph osd tree
>>> ceph osd crush dump
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Nov 21, 2013 at 7:59 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>>> On Thu, Nov 21, 2013 at 7:52 AM, Ugis <ugis22@xxxxxxxxx> wrote:
>>>>> Thanks, reread that section in docs and found tunables profile - nice
>>>>> to have, hadn't noticed it before(ceph docs develop so fast that you
>>>>> need RSS to follow all changes :) )
>>>>>
>>>>> Still problem persists in a different way.
>>>>> Did set profile "optimal", reballancing started, but I had "rbd
>>>>> delete" in background, in the end cluster ended up with negative
>>>>> degradation %
>>>>> I think I have hit bug http://tracker.ceph.com/issues/3720   which is
>>>>> still open.
>>>>> I did restart osds one by one and negative degradation dissapeared.
>>>>>
>>>>> Afterwards I added extra ~900GB data, degradation growed in process to 0.071%
>>>>> This is rather http://tracker.ceph.com/issues/3747  which is closed,
>>>>> but seems to happen still.
>>>>> I did "ceph osd out X; sleep 40; ceph osd in X" for all osds,
>>>>> degradation % went away.
>>>>>
>>>>> In the end I still have "55 active+remapped" pgs and no degradation %.
>>>>> "pgmap v1853405: 2662 pgs: 2607 active+clean, 55 active+remapped; 5361
>>>>> GB data, 10743 GB used, 10852 GB / 21595 GB avail; 25230KB/s rd,
>>>>> 203op/s"
>>>>>
>>>>> I queried some of remapped pgs, do not see why they do not
>>>>> reballance(tunables are optimal now, checked).
>>>>>
>>>>> Where to look for the reason they are not reballancing? Is there
>>>>> something to look for in osd logs if debug level is increased?
>>>>>
>>>>> one of those:
>>>>> # ceph pg 4.5e query
>>>>> { "state": "active+remapped",
>>>>>   "epoch": 9165,
>>>>>   "up": [
>>>>>         9],
>>>>>   "acting": [
>>>>>         9,
>>>>>         5],
>>>>
>>>> For some reason CRUSH is still failing to map all the PGs to two hosts
>>>> (notice how the "up" set is only one OSD, so it's adding another one
>>>> in "acting") — what's your CRUSH map look like?
>>>> -Greg
>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>> --
>>> John Wilkins
>>> Senior Technical Writer
>>> Intank
>>> john.wilkins@xxxxxxxxxxx
>>> (415) 425-9599
>>> http://inktank.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com