Now wonder what is the correct way to replace a failed OSD block disk? Generic way for maintenance (e.g. disk replace) is rebalance by change osd weight: ceph osd crush reweight osdid 0 cluster migrate data "from this osd" When HEALTH_OK you can safe remove this OSD: ceph osd out osd_id systemctl stop ceph-osd at osd_id ceph osd crush remove osd_id ceph auth del osd_id ceph osd rm osd_id1. Find nvme partition for this OSD. You can't do it in several ways. ceph-volume, by hand or with "ceph-disk list" (because is more human readable): /dev/sda : /dev/sda1 ceph data, active, cluster ceph, osd.0, block /dev/sda2, block.db /dev/nvme2n1p1, block.wal /dev/nvme2n1p2 /dev/sda2 ceph block, for /dev/sda1 2. Delete partition via parted or fdisk. fdisk -u /dev/nvme2n1 d (delete partitions) enter partition number of block.db: 1 d enter partition number of block.wal: 2 w (write partition table) 3. Deploy your new OSD.I'm not sure if there is something to do with the still existing bluefs db and wal partitions on the nvme device for the failed OSD. Do they have to be zapped ? If yes, what is the best way? |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com