Re: replace failed disk in Luminous v12.2.2

Konstantin Shalygin <k0ste@xxxxxxxx> · Thu, 11 Jan 2018 18:57:05 +0700

        Now wonder what is the correct way to replace a failed OSD block disk?

    Generic way for maintenance (e.g. disk replace) is rebalance by change osd weight:

ceph osd crush reweight osdid 0

cluster migrate data "from this osd"
When HEALTH_OK you can safe remove this OSD:

ceph osd out osd_id
systemctl stop ceph-osd at osd_id
ceph osd crush remove osd_id
ceph auth del osd_id
ceph osd rm osd_id

I'm not sure if there is something to do with the still existing bluefs db and wal partitions on the nvme device for the failed OSD. Do they have to be zapped ? If yes, what is the best way?

1. Find nvme partition for this OSD. You can't do it in several ways. ceph-volume, by hand or with "ceph-disk list" (because is more human readable):

/dev/sda :
 /dev/sda1 ceph data, active, cluster ceph, osd.0, block /dev/sda2, block.db /dev/nvme2n1p1, block.wal /dev/nvme2n1p2
 /dev/sda2 ceph block, for /dev/sda1

2. Delete partition via parted or fdisk.

fdisk -u /dev/nvme2n1
d (delete partitions)
enter partition number of block.db: 1
d
enter partition number of block.wal: 2
w (write partition table)

3. Deploy your new OSD.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com