Hello. It's been a while. I have a Nautilus cluster with 72 x 12GB HDD OSDs (BlueStore) and mostly of EC 8+2 pools/PGs. It's been working great - some nodes went nearly 900 days without a reboot. As of yesterday I found that I have 3 OSDs with a Smart status of 'Pending Failure'. New drives are ordered and will be here next week. There is a procedure in the documentation for replacing an OSD, but I can't do that directly until I receive the drives. My inclination is to mark these 3 OSDs 'OUT' before they crash completely, but I want to confirm my understanding of Ceph's response to this. Mainly, given my EC pools (or replicated pools for that matter), if I mark all 3 OSD out all at once will I risk data loss? If I have it right, marking an OSD out will simply cause Ceph to move all of the PG shards from that OSD to other OSDs, so no major risk of data loss. However, if it would be better to do them one per day or something, I'd rather be safe. I also assume that I should wait for the rebalance to complete before I initiate the replacement procedure. Your thoughts? Thanks. -Dave -- Dave Hall Binghamton University kdhall@xxxxxxxxxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx