cluster can't remapped objects after change crush tree

Igor Gajsin <igor@xxxxxxxxxxx> · Wed, 25 Apr 2018 17:52:50 +0200

Hi, I've got stuck in a problem with crush rule.
I have a small cluster with 3 nodes and 4 osd. I've decided to split
it to 2 failure domains and made 2 buckets and put hosts in that buckets
like in that instruction
http://www.sebastien-han.fr/blog/2014/01/13/ceph-managing-crush-with-the-cli/

Finally, I've got crush tree like

# ceph osd crush tree
ID  CLASS WEIGHT  TYPE NAME
 -1       3.63835 root default
 -9       0.90959     pod group1
 -5       0.90959         host feather1
  1   hdd 0.90959             osd.1
-10       2.72876     pod group2
 -7       1.81918         host ds1
  2   hdd 0.90959             osd.2
  3   hdd 0.90959             osd.3
 -3       0.90958         host feather0
  0   hdd 0.90958             osd.0

And I've made a rule

# ceph osd crush rule dump pods
{
    "rule_id": 1,
    "rule_name": "pods",
    "ruleset": 1,
    "type": 1,
    "min_size": 1,
    "max_size": 10,
    "steps": [
        {
            "op": "take",
            "item": -1,
            "item_name": "default"
        },
        {
            "op": "chooseleaf_firstn",
            "num": 0,
            "type": "pod"
        },
        {
            "op": "emit"
        }
    ]
}

If to apply that rule to a pool, my cluster moves to

# ceph -s
 cluster:
    id:     34b66329-b511-4d97-9e07-7b1a0a6879ef
    health: HEALTH_WARN
            6/42198 objects misplaced (0.014%)

  services:
    mon: 3 daemons, quorum feather0,feather1,ds1
    mgr: ds1(active), standbys: feather1, feather0
    mds: cephfs-1/1/1 up  {0=feather0=up:active}, 2 up:standby
    osd: 4 osds: 4 up, 4 in; 64 remapped pgs
    rgw: 3 daemons active

  data:
    pools:   8 pools, 264 pgs
    objects: 14066 objects, 49429 MB
    usage:   142 GB used, 3582 GB / 3725 GB avail
    pgs:     6/42198 objects misplaced (0.014%)
             200 active+clean
             64  active+clean+remapped

  io:
    client:   1897 kB/s wr, 0 op/s rd, 11 op/s wr

And it's frozen in that state, self-healing doesn't occur, just stuck in
the state: objects misplaced / pgs active+clean+remapped.

I think something wrong with my rule, and the cluster can't move objects
to rearrange it according to the new rule. I lost something and I have no idea
what exactly. Any help would be appreciated.

--
With best regards,
Igor Gajsin
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com