I guess this is related to your crush rules.. Unfortunaly i dont know much about creating the rules... But someone cloud give more insights when you also provide crush rule dump .... your "-1 0 root default" is a bit strange Am 1. April 2023 01:01:39 MESZ schrieb Johan Hattne <johan@xxxxxxxxx>: >Here goes: > ># ceph -s > cluster: > id: e1327a10-8b8c-11ed-88b9-3cecef0e3946 > health: HEALTH_OK > > services: > mon: 5 daemons, quorum bcgonen-a,bcgonen-b,bcgonen-c,bcgonen-r0h0,bcgonen-r0h1 (age 16h) > mgr: bcgonen-b.furndm(active, since 8d), standbys: bcgonen-a.qmmqxj > mds: 1/1 daemons up, 2 standby > osd: 36 osds: 36 up (since 16h), 36 in (since 3d); 1041 remapped pgs > > data: > volumes: 1/1 healthy > pools: 3 pools, 1041 pgs > objects: 5.42M objects, 6.5 TiB > usage: 19 TiB used, 428 TiB / 447 TiB avail > pgs: 27087125/16252275 objects misplaced (166.667%) > 1039 active+clean+remapped > 2 active+clean+remapped+scrubbing+deep > ># ceph osd tree >ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF >-14 149.02008 rack rack-1 > -7 149.02008 host bcgonen-r1h0 > 20 hdd 14.55269 osd.20 up 1.00000 1.00000 > 21 hdd 14.55269 osd.21 up 1.00000 1.00000 > 22 hdd 14.55269 osd.22 up 1.00000 1.00000 > 23 hdd 14.55269 osd.23 up 1.00000 1.00000 > 24 hdd 14.55269 osd.24 up 1.00000 1.00000 > 25 hdd 14.55269 osd.25 up 1.00000 1.00000 > 26 hdd 14.55269 osd.26 up 1.00000 1.00000 > 27 hdd 14.55269 osd.27 up 1.00000 1.00000 > 28 hdd 14.55269 osd.28 up 1.00000 1.00000 > 29 hdd 14.55269 osd.29 up 1.00000 1.00000 > 34 ssd 1.74660 osd.34 up 1.00000 1.00000 > 35 ssd 1.74660 osd.35 up 1.00000 1.00000 >-13 298.04016 rack rack-0 > -3 149.02008 host bcgonen-r0h0 > 0 hdd 14.55269 osd.0 up 1.00000 1.00000 > 1 hdd 14.55269 osd.1 up 1.00000 1.00000 > 2 hdd 14.55269 osd.2 up 1.00000 1.00000 > 3 hdd 14.55269 osd.3 up 1.00000 1.00000 > 4 hdd 14.55269 osd.4 up 1.00000 1.00000 > 5 hdd 14.55269 osd.5 up 1.00000 1.00000 > 6 hdd 14.55269 osd.6 up 1.00000 1.00000 > 7 hdd 14.55269 osd.7 up 1.00000 1.00000 > 8 hdd 14.55269 osd.8 up 1.00000 1.00000 > 9 hdd 14.55269 osd.9 up 1.00000 1.00000 > 30 ssd 1.74660 osd.30 up 1.00000 1.00000 > 31 ssd 1.74660 osd.31 up 1.00000 1.00000 > -5 149.02008 host bcgonen-r0h1 > 10 hdd 14.55269 osd.10 up 1.00000 1.00000 > 11 hdd 14.55269 osd.11 up 1.00000 1.00000 > 12 hdd 14.55269 osd.12 up 1.00000 1.00000 > 13 hdd 14.55269 osd.13 up 1.00000 1.00000 > 14 hdd 14.55269 osd.14 up 1.00000 1.00000 > 15 hdd 14.55269 osd.15 up 1.00000 1.00000 > 16 hdd 14.55269 osd.16 up 1.00000 1.00000 > 17 hdd 14.55269 osd.17 up 1.00000 1.00000 > 18 hdd 14.55269 osd.18 up 1.00000 1.00000 > 19 hdd 14.55269 osd.19 up 1.00000 1.00000 > 32 ssd 1.74660 osd.32 up 1.00000 1.00000 > 33 ssd 1.74660 osd.33 up 1.00000 1.00000 > -1 0 root default > ># ceph osd pool ls detail >pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 31 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr >pool 2 'cephfs.cephfs.meta' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode on last_change 9833 lfor 0/0/584 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs >pool 3 'cephfs.cephfs.data' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 1024 pgp_num 1024 autoscale_mode on last_change 7630 lfor 0/1831/6544 flags hashpspool,bulk stripe_width 0 application cephfs > >crush_rules 1 and 2 are just used to assign the data and meta pool to HDD and SSD, respectively (failure domain: host). > >// J > >On 2023-03-31 15:37, ceph@xxxxxxxxxx wrote: >> Need to know some more about your cluster... >> >> Ceph -s >> Ceph osd df tree >> Replica or ec? >> ... >> >> Perhaps this can give us some insight >> Mehmet >> >> Am 31. März 2023 18:08:38 MESZ schrieb Johan Hattne <johan@xxxxxxxxx>: >> >> Dear all; >> >> Up until a few hours ago, I had a seemingly normally-behaving cluster (Quincy, 17.2.5) with 36 OSDs, evenly distributed across 3 of its 6 nodes. The cluster is only used for CephFS and the only non-standard configuration I can think of is that I had 2 active MDSs, but only 1 standby. I had also doubled mds_cache_memory limit to 8 GB (all OSD hosts have 256 G of RAM) at some point in the past. >> >> Then I rebooted one of the OSD nodes. The rebooted node held one of the active MDSs. Now the node is back up: ceph -s says the cluster is healthy, but all PGs are in a active+clean+remapped state and 166.67% of the objects are misplaced (dashboard: -66.66% healthy). >> >> The data pool is a threefold replica with 5.4M object, the number of misplaced objects is reported as 27087410/16252446. The denominator in the ratio makes sense to me (16.2M / 3 = 5.4M), but the numerator does not. I also note that the ratio is *exactly* 5 / 3. The filesystem is still mounted and appears to be usable, but df reports it as 100% full; I suspect it would say 167% but that is capped somewhere. >> >> Any ideas about what is going on? Any suggestions for recovery? >> >> // Best wishes; Johan >> ------------------------------------------------------------------------ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx