I set osd.7 as "in", uncordened the node, scaled the OSD deployment back up and things are recovering with cluster status HEALTH_OK. I found this message from the archives: https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg47071.html "You have a large difference in the capacities of the nodes. This results in a different host weight, which in turn might lead to problems with the crush algorithm. It is not able to get three different hosts for OSD placement for some of the PGs. CEPH and crush do not cope well with heterogenous setups. I would suggest to move one of the OSDs from host ceph1 to ceph4 to equalize the host weight." My nodes do have very different weights. What I am trying to do is re-install each node in the cluster so they all have the same amount of space for Ceph (much less than before .. we need more space for hostpath stuff). # ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 13.77573 root default -5 13.77573 region FSN1 -22 0.73419 zone FSN1-DC13 -21 0 host node5-redacted-com -27 0.73419 host node7-redacted-com 1 ssd 0.36710 osd.1 up 1.00000 1.00000 5 ssd 0.36710 osd.5 up 1.00000 1.00000 -10 6.20297 zone FSN1-DC14 -9 6.20297 host node3-redacted-com 2 ssd *3.10149* osd.2 up 1.00000 1.00000 4 ssd *3.10149* osd.4 up 1.00000 1.00000 -18 3.19919 zone FSN1-DC15 -17 *3.19919* host node4-redacted-com 7 ssd *3.19919* osd.7 down 0 1.00000 -4 2.90518 zone FSN1-DC16 -3 2.90518 host node1-redacted-com 0 ssd *1.45259* osd.0 up 1.00000 1.00000 3 ssd *1.45259* osd.3 up 1.00000 1.00000 -14 0.73419 zone FSN1-DC18 -13 0 host node2-redacted-com -25 0.73419 host node6-redacted-com 10 ssd 0.36710 osd.10 up 1.00000 1.00000 11 ssd 0.36710 osd.11 up 1.00000 1.00000 Should I just change the weights before/after removing OSD 7? With something like "ceph osd crush reweight osd.7 1.0"? Thanks On Thu, Nov 18, 2021 at 9:41 PM Stefan Kooman <stefan@xxxxxx> wrote: > On 11/18/21 17:08, David Tinker wrote: > > Would it be worth setting the OSD I removed back to "in" (or whatever > > the opposite of "out") is and seeing if things recovered? > > ceph osd in osd.7 would that be. It shouldn't hurt. But I really don't > understand why this won't resolve itself. > > If this gets it fixed you might want to try the pgmapper "drain" command > from [1]. And when that is done set the osd out. > > Gr. Stefan > > [1]: https://github.com/digitalocean/pgremapper/ > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx