Re: cluster can't remapped objects after change crush tree

Igor Gajsin <igor@xxxxxxxxxxx> · Thu, 26 Apr 2018 18:30:11 +0200

Hi Konstantin, thanks a lot for your response.

> Your crush is imbalanced:

I do it deliberately. The group2 of my small-but-helpful ceph cluster
also will be a master-nodes for my new small-but-helpful kubernetes
cluster. And I what I want to achieve is: there are 2 groups of nodes,
and even if one of them completely failed (during k8s installation),
another group will contain a copy of data.

But, ok. Let's rebalance it for test purpose:

ID  CLASS WEIGHT  TYPE NAME
 -1       3.63835 root default
 -9       1.81917     pod group1
 -3       0.90958         host feather0
  0   hdd 0.90958             osd.0
 -5       0.90959         host feather1
  1   hdd 0.90959             osd.1
-10       1.81918     pod group2
 -7       1.81918         host ds1
  2   hdd 0.90959             osd.2
  3   hdd 0.90959             osd.3

and add your rule

> ceph osd crush rule create-replicated podshdd default pod hdd

# ceph osd crush rule dump podshdd
{
    "rule_id": 3,
    "rule_name": "podshdd",
    "ruleset": 3,
    "type": 1,
    "min_size": 1,
    "max_size": 10,
    "steps": [
        {
            "op": "take",
            "item": -2,
            "item_name": "default~hdd"
        },
        {
            "op": "chooseleaf_firstn",
            "num": 0,
            "type": "pod"
        },
        {
            "op": "emit"
        }
    ]
}

after assigning this rule to a pool it stucks in the same state:

# ceph -s
  cluster:
    id:     34b66329-b511-4d97-9e07-7b1a0a6879ef
    health: HEALTH_WARN
            3971/42399 objects misplaced (9.366%)

  services:
    mon: 3 daemons, quorum feather0,feather1,ds1
    mgr: ds1(active), standbys: feather1, feather0
    mds: cephfs-1/1/1 up  {0=feather0=up:active}, 2 up:standby
    osd: 4 osds: 4 up, 4 in; 128 remapped pgs
    rgw: 3 daemons active

  data:
    pools:   8 pools, 264 pgs
    objects: 14133 objects, 49684 MB
    usage:   143 GB used, 3582 GB / 3725 GB avail
    pgs:     3971/42399 objects misplaced (9.366%)
             136 active+clean
             128 active+clean+remapped

  io:
    client:   19441 B/s rd, 29673 B/s wr, 18 op/s rd, 18 op/s wr

And what interesting. First, it complains like "object misplaced (23%)"
and ceph health detail shows a lot of degraded pg. But then there is no
pg in its output:

# ceph health detail
HEALTH_WARN 3971/42399 objects misplaced (9.366%)
OBJECT_MISPLACED 3971/42399 objects misplaced (9.366%)

and amount of misplaced objects stops reducing it is equal 9.366 last 30
mins.

If switch the crush rule back to default the cluster returns to HEALTH_OK state.

Konstantin Shalygin writes:

>> # ceph osd crush tree
>> ID  CLASS WEIGHT  TYPE NAME
>>   -1       3.63835 root default
>>   -9       0.90959     pod group1
>>   -5       0.90959         host feather1
>>    1   hdd 0.90959             osd.1
>> -10       2.72876     pod group2
>>   -7       1.81918         host ds1
>>    2   hdd 0.90959             osd.2
>>    3   hdd 0.90959             osd.3
>>   -3       0.90958         host feather0
>>    0   hdd 0.90958             osd.0
>>
>> And I've made a rule
>>
>> # ceph osd crush rule dump pods
>> {
>>      "rule_id": 1,
>>      "rule_name": "pods",
>>      "ruleset": 1,
>>      "type": 1,
>>      "min_size": 1,
>>      "max_size": 10,
>>      "steps": [
>>          {
>>              "op": "take",
>>              "item": -1,
>>              "item_name": "default"
>>          },
>>          {
>>              "op": "chooseleaf_firstn",
>>              "num": 0,
>>              "type": "pod"
>>          },
>>          {
>>              "op": "emit"
>>          }
>>      ]
>> }
>
>
> 1. Assign device class to your crush rule:
>
> ceph osd crush rule create-replicated pods default pod hdd
>
> 2. Your crush is imbalanced:
>
> *good*:
>
> root:
>
>  host1:
>
>   - osd0
>
>  host2:
>
>   - osd1
>
>  host3:
>
>   - osd3
>
>
> *bad*:
>
> root:
>
>  host1:
>
>   - osd0
>
>  host2:
>
>   - osd1
>
>   - osd2
>
>   - osd3
>
>
>
>
> k

--
With best regards,
Igor Gajsin
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com