Hi Konstantin, thanks a lot for your response. > Your crush is imbalanced: I do it deliberately. The group2 of my small-but-helpful ceph cluster also will be a master-nodes for my new small-but-helpful kubernetes cluster. And I what I want to achieve is: there are 2 groups of nodes, and even if one of them completely failed (during k8s installation), another group will contain a copy of data. But, ok. Let's rebalance it for test purpose: ID CLASS WEIGHT TYPE NAME -1 3.63835 root default -9 1.81917 pod group1 -3 0.90958 host feather0 0 hdd 0.90958 osd.0 -5 0.90959 host feather1 1 hdd 0.90959 osd.1 -10 1.81918 pod group2 -7 1.81918 host ds1 2 hdd 0.90959 osd.2 3 hdd 0.90959 osd.3 and add your rule > ceph osd crush rule create-replicated podshdd default pod hdd # ceph osd crush rule dump podshdd { "rule_id": 3, "rule_name": "podshdd", "ruleset": 3, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -2, "item_name": "default~hdd" }, { "op": "chooseleaf_firstn", "num": 0, "type": "pod" }, { "op": "emit" } ] } after assigning this rule to a pool it stucks in the same state: # ceph -s cluster: id: 34b66329-b511-4d97-9e07-7b1a0a6879ef health: HEALTH_WARN 3971/42399 objects misplaced (9.366%) services: mon: 3 daemons, quorum feather0,feather1,ds1 mgr: ds1(active), standbys: feather1, feather0 mds: cephfs-1/1/1 up {0=feather0=up:active}, 2 up:standby osd: 4 osds: 4 up, 4 in; 128 remapped pgs rgw: 3 daemons active data: pools: 8 pools, 264 pgs objects: 14133 objects, 49684 MB usage: 143 GB used, 3582 GB / 3725 GB avail pgs: 3971/42399 objects misplaced (9.366%) 136 active+clean 128 active+clean+remapped io: client: 19441 B/s rd, 29673 B/s wr, 18 op/s rd, 18 op/s wr And what interesting. First, it complains like "object misplaced (23%)" and ceph health detail shows a lot of degraded pg. But then there is no pg in its output: # ceph health detail HEALTH_WARN 3971/42399 objects misplaced (9.366%) OBJECT_MISPLACED 3971/42399 objects misplaced (9.366%) and amount of misplaced objects stops reducing it is equal 9.366 last 30 mins. If switch the crush rule back to default the cluster returns to HEALTH_OK state. Konstantin Shalygin writes: >> # ceph osd crush tree >> ID CLASS WEIGHT TYPE NAME >> -1 3.63835 root default >> -9 0.90959 pod group1 >> -5 0.90959 host feather1 >> 1 hdd 0.90959 osd.1 >> -10 2.72876 pod group2 >> -7 1.81918 host ds1 >> 2 hdd 0.90959 osd.2 >> 3 hdd 0.90959 osd.3 >> -3 0.90958 host feather0 >> 0 hdd 0.90958 osd.0 >> >> And I've made a rule >> >> # ceph osd crush rule dump pods >> { >> "rule_id": 1, >> "rule_name": "pods", >> "ruleset": 1, >> "type": 1, >> "min_size": 1, >> "max_size": 10, >> "steps": [ >> { >> "op": "take", >> "item": -1, >> "item_name": "default" >> }, >> { >> "op": "chooseleaf_firstn", >> "num": 0, >> "type": "pod" >> }, >> { >> "op": "emit" >> } >> ] >> } > > > 1. Assign device class to your crush rule: > > ceph osd crush rule create-replicated pods default pod hdd > > 2. Your crush is imbalanced: > > *good*: > > root: > > host1: > > - osd0 > > host2: > > - osd1 > > host3: > > - osd3 > > > *bad*: > > root: > > host1: > > - osd0 > > host2: > > - osd1 > > - osd2 > > - osd3 > > > > > k -- With best regards, Igor Gajsin _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com