Hello again, following up on the previous mail, one cluster gets rather slow at the moment and we have spotted something "funny": When checking ceph pg dump we see some osds have HB peers with osds that they should not have any pg in common with. When restarting one of the effected osds, we get the following message: mon_cmd_maybe_osd_create fail: 'osd.12 has already bound to class 'xxx-ssd', can not reset class to 'hdd'; use 'ceph osd crush rm-device-class <id>' to remove old class first': (16) Device or resource busy When checking the output of ceph osd tree, it seems to be in the correct class: 12 xxx-ssd 0.21767 osd.12 up 1.00000 1.00000 Is it possible that the osd has "multiple" classes / that the cluster remebers a class that was set to osd.12 when it used to be an HDD? The output of ceph pg dump includes at the bottom this OSD_STAT USED AVAIL USED_RAW TOTAL HB_PEERS PG_SUM PRIMARY_PG_SUM 12 150 GiB 72 GiB 151 GiB 223 GiB [3,11,13,25,36,43,54,64,71,82] 128 35 which is wrong, because osd.12 should only peer with osd.3 and osd.25, which are the only ones in the same pool that has the replicated rule set to match on xxx-ssd. And the obvious question: how do we fix this? At the moment we see around 75 pgs in peering and 39 activating, most of them which are in a pool with slower SSDs, but it seems that these peerings affect another pool that should have faster SSDs. Best regards, Nico -- Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx