Hi community Please help me understand what is going on. I have a ceph (Reef) test cluster with the following crushmap ceph osd crush tree ID CLASS WEIGHT TYPE NAME -1 12.00000 root default -7 3.00000 host ksr-ceph-osd1 0 hdd 1.00000 osd.0 6 hdd 1.00000 osd.6 10 hdd 1.00000 osd.10 -9 3.00000 host ksr-ceph-osd2 3 hdd 1.00000 osd.3 7 hdd 1.00000 osd.7 11 hdd 1.00000 osd.11 -5 3.00000 host ksr-ceph-osd3 2 hdd 1.00000 osd.2 5 hdd 1.00000 osd.5 9 hdd 1.00000 osd.9 -3 3.00000 host ksr-ceph-osd4 1 hdd 1.00000 osd.1 4 hdd 1.00000 osd.4 8 hdd 1.00000 osd.8 -11 0 rack rack1 -13 0 rack rack2 -15 0 rack rack3 -17 0 rack rack4 Ceph status is like: cluster: id: 8a174287-42f8-43b6-9973-f174110b508b health: HEALTH_OK services: mon: 5 daemons, quorum ksr-ceph-mon2,ksr-ceph-mon3,ksr-ceph-mon1,ksr-ceph-mon5,ksr-ceph-mon4 (age 3h) mgr: ksr-ceph-mon1(active, since 3h), standbys: ksr-ceph-mon2, ksr-ceph-mon3 mds: 2/2 daemons up, 3 standby osd: 12 osds: 12 up (since 3h), 12 in (since 6d) data: volumes: 2/2 healthy pools: 5 pools, 129 pgs objects: 578 objects, 154 MiB usage: 1.0 GiB used, 599 GiB / 600 GiB avail pgs: 129 active+clean I then run: ceph osd set norecover ceph osd set nobackfill ceph osd set norebalance ceph osd crush move ksr-ceph-osd1 rack=rack1 ceph osd crush move ksr-ceph-osd2 rack=rack2 ceph osd crush move ksr-ceph-osd3 rack=rack3 ceph osd crush move ksr-ceph-osd4 rack=rack4 resulting in the following crush tree ceph osd crush tree ID CLASS WEIGHT TYPE NAME -1 12.00000 root default -11 3.00000 rack rack1 -7 3.00000 host ksr-ceph-osd1 0 hdd 1.00000 osd.0 6 hdd 1.00000 osd.6 10 hdd 1.00000 osd.10 -13 3.00000 rack rack2 -9 3.00000 host ksr-ceph-osd2 3 hdd 1.00000 osd.3 7 hdd 1.00000 osd.7 11 hdd 1.00000 osd.11 -15 3.00000 rack rack3 -5 3.00000 host ksr-ceph-osd3 2 hdd 1.00000 osd.2 5 hdd 1.00000 osd.5 9 hdd 1.00000 osd.9 -17 3.00000 rack rack4 -3 3.00000 host ksr-ceph-osd4 1 hdd 1.00000 osd.1 4 hdd 1.00000 osd.4 8 hdd 1.00000 osd.8 And ceph status is like: cluster: id: 8a174287-42f8-43b6-9973-f174110b508b health: HEALTH_WARN nobackfill,norebalance,norecover flag(s) set Degraded data redundancy: 2701/1734 objects degraded (155.767%), 55 pgs degraded services: mon: 5 daemons, quorum ksr-ceph-mon2,ksr-ceph-mon3,ksr-ceph-mon1,ksr-ceph-mon5,ksr-ceph-mon4 (age 3h) mgr: ksr-ceph-mon1(active, since 3h), standbys: ksr-ceph-mon2, ksr-ceph-mon3 mds: 2/2 daemons up, 3 standby osd: 12 osds: 12 up (since 3h), 12 in (since 6d); 22 remapped pgs flags nobackfill,norebalance,norecover data: volumes: 2/2 healthy pools: 5 pools, 129 pgs objects: 578 objects, 154 MiB usage: 1.0 GiB used, 599 GiB / 600 GiB avail pgs: 2701/1734 objects degraded (155.767%) 479/1734 objects misplaced (27.624%) 70 active+clean 34 active+recovery_wait+degraded 20 active+recovery_wait+undersized+degraded+remapped 3 active+recovering 1 active+recovery_wait+remapped 1 active+recovery_wait+degraded+remapped No crush rules have been changed on the pools. All pools has the default replicated_rule ceph osd crush rule dump replicated_rule { "rule_id": 0, "rule_name": "replicated_rule", "type": 1, "steps": [ { "op": "take", "item": -1, "item_name": "default" }, { "op": "chooseleaf_firstn", "num": 0, "type": "host" }, { "op": "emit" } ] } Questions 1 - Why do this result in such a high - objects degraded - percentage? 2 - Why do PGs get undersized? All in all this behavior does not make sense to me - I expect nothing to happen basically since I've only implemented some buckets in the map, no rules have changed - so I reach out in hope that someone can explain me the logic. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx