Update: I noticed that I hadn't increased pgp_num for default data pool for which I increased pg_num time ago. So I did now and some backfilling happened. Now I still have "31 actige+remapped" pgs. Remapped pgs belong to all pools, even those where is no data. To me suspicious is that host ceph8 has weight 10.88(I had some osds there temporarily, but due to low ram I remover those) If that is of importance ceph7 is also low on ram(4GB) and is slower to respond at times than ceph5(Sage mentioned "lagging pg peering workqueue" in Bug#3747). Results follow: # ceph osd tree # id weight type name up/down reweight -5 0 root slow -4 0 host ceph5-slow -1 32.46 root default -2 10.5 host ceph5 0 0.2 osd.0 up 0 2 2.8 osd.2 up 1 3 2.8 osd.3 up 1 4 1.9 osd.4 up 1 5 2.8 osd.5 up 1 -3 0.2 host ceph6 1 0.2 osd.1 up 0 -6 10.88 host ceph7 6 2.73 osd.6 up 1 7 2.73 osd.7 up 1 8 2.71 osd.8 up 1 9 2.71 osd.9 up 1 -7 10.88 host ceph8 # ceph osd crush dump { "devices": [ { "id": 0, "name": "osd.0"}, { "id": 1, "name": "osd.1"}, { "id": 2, "name": "osd.2"}, { "id": 3, "name": "osd.3"}, { "id": 4, "name": "osd.4"}, { "id": 5, "name": "osd.5"}, { "id": 6, "name": "osd.6"}, { "id": 7, "name": "osd.7"}, { "id": 8, "name": "osd.8"}, { "id": 9, "name": "osd.9"}], "types": [ { "type_id": 0, "name": "osd"}, { "type_id": 1, "name": "host"}, { "type_id": 2, "name": "rack"}, { "type_id": 3, "name": "row"}, { "type_id": 4, "name": "room"}, { "type_id": 5, "name": "datacenter"}, { "type_id": 6, "name": "root"}], "buckets": [ { "id": -1, "name": "default", "type_id": 6, "type_name": "root", "weight": 2127297, "alg": "straw", "hash": "rjenkins1", "items": [ { "id": -2, "weight": 688128, "pos": 0}, { "id": -3, "weight": 13107, "pos": 1}, { "id": -6, "weight": 713031, "pos": 2}, { "id": -7, "weight": 713031, "pos": 3}]}, { "id": -2, "name": "ceph5", "type_id": 1, "type_name": "host", "weight": 688125, "alg": "straw", "hash": "rjenkins1", "items": [ { "id": 0, "weight": 13107, "pos": 0}, { "id": 2, "weight": 183500, "pos": 1}, { "id": 3, "weight": 183500, "pos": 2}, { "id": 4, "weight": 124518, "pos": 3}, { "id": 5, "weight": 183500, "pos": 4}]}, { "id": -3, "name": "ceph6", "type_id": 1, "type_name": "host", "weight": 13107, "alg": "straw", "hash": "rjenkins1", "items": [ { "id": 1, "weight": 13107, "pos": 0}]}, { "id": -4, "name": "ceph5-slow", "type_id": 1, "type_name": "host", "weight": 0, "alg": "straw", "hash": "rjenkins1", "items": []}, { "id": -5, "name": "slow", "type_id": 6, "type_name": "root", "weight": 0, "alg": "straw", "hash": "rjenkins1", "items": [ { "id": -4, "weight": 0, "pos": 0}]}, { "id": -6, "name": "ceph7", "type_id": 1, "type_name": "host", "weight": 713030, "alg": "straw", "hash": "rjenkins1", "items": [ { "id": 6, "weight": 178913, "pos": 0}, { "id": 7, "weight": 178913, "pos": 1}, { "id": 8, "weight": 177602, "pos": 2}, { "id": 9, "weight": 177602, "pos": 3}]}, { "id": -7, "name": "ceph8", "type_id": 1, "type_name": "host", "weight": 0, "alg": "straw", "hash": "rjenkins1", "items": []}], "rules": [ { "rule_id": 0, "rule_name": "data", "ruleset": 0, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -1}, { "op": "chooseleaf_firstn", "num": 0, "type": "host"}, { "op": "emit"}]}, { "rule_id": 1, "rule_name": "metadata", "ruleset": 1, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -1}, { "op": "chooseleaf_firstn", "num": 0, "type": "host"}, { "op": "emit"}]}, { "rule_id": 2, "rule_name": "rbd", "ruleset": 2, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -1}, { "op": "chooseleaf_firstn", "num": 0, "type": "host"}, { "op": "emit"}]}, { "rule_id": 3, "rule_name": "own1", "ruleset": 3, "type": 1, "min_size": 1, "max_size": 20, "steps": [ { "op": "take", "item": -1}, { "op": "chooseleaf_firstn", "num": 0, "type": "host"}, { "op": "emit"}]}], "tunables": { "choose_local_tries": 0, "choose_local_fallback_tries": 0, "choose_total_tries": 50, "chooseleaf_descend_once": 1}} Ugis 2013/11/21 John Wilkins <john.wilkins@xxxxxxxxxxx>: > Ugis, > > Can you provide the results for: > > ceph osd tree > ceph osd crush dump > > > > > > > On Thu, Nov 21, 2013 at 7:59 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >> On Thu, Nov 21, 2013 at 7:52 AM, Ugis <ugis22@xxxxxxxxx> wrote: >>> Thanks, reread that section in docs and found tunables profile - nice >>> to have, hadn't noticed it before(ceph docs develop so fast that you >>> need RSS to follow all changes :) ) >>> >>> Still problem persists in a different way. >>> Did set profile "optimal", reballancing started, but I had "rbd >>> delete" in background, in the end cluster ended up with negative >>> degradation % >>> I think I have hit bug http://tracker.ceph.com/issues/3720 which is >>> still open. >>> I did restart osds one by one and negative degradation dissapeared. >>> >>> Afterwards I added extra ~900GB data, degradation growed in process to 0.071% >>> This is rather http://tracker.ceph.com/issues/3747 which is closed, >>> but seems to happen still. >>> I did "ceph osd out X; sleep 40; ceph osd in X" for all osds, >>> degradation % went away. >>> >>> In the end I still have "55 active+remapped" pgs and no degradation %. >>> "pgmap v1853405: 2662 pgs: 2607 active+clean, 55 active+remapped; 5361 >>> GB data, 10743 GB used, 10852 GB / 21595 GB avail; 25230KB/s rd, >>> 203op/s" >>> >>> I queried some of remapped pgs, do not see why they do not >>> reballance(tunables are optimal now, checked). >>> >>> Where to look for the reason they are not reballancing? Is there >>> something to look for in osd logs if debug level is increased? >>> >>> one of those: >>> # ceph pg 4.5e query >>> { "state": "active+remapped", >>> "epoch": 9165, >>> "up": [ >>> 9], >>> "acting": [ >>> 9, >>> 5], >> >> For some reason CRUSH is still failing to map all the PGs to two hosts >> (notice how the "up" set is only one OSD, so it's adding another one >> in "acting") — what's your CRUSH map look like? >> -Greg >> Software Engineer #42 @ http://inktank.com | http://ceph.com >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- > John Wilkins > Senior Technical Writer > Intank > john.wilkins@xxxxxxxxxxx > (415) 425-9599 > http://inktank.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com