Re: Sequence replacing a failed OSD disk? [EXT]

Matthew Vernon <mv3@xxxxxxxxxxxx> · Mon, 4 Jan 2021 10:15:14 +0000

On 31/12/2020 09:10, Rainer Krienke wrote:

Yesterday my ceph nautilus 14.2.15 cluster had a disk with unreadable
sectors, after several tries the OSD was marked down and rebalancing
started and has also finished successfully. ceph osd stat shows the osd
now as "autoout,exists".

Usually the steps to replace a failed disk are:
1. Destroy the failed OSD: ceph osd destroy {id}
2. run ceph-volume lvm create --bluestore --osd-id {id} --data /dev/sdX
... with a new disk in place to recreate a OSD with the same id without
the need to change the crushmap or auth info etc.

Now I still wait for a new disk and I am a unsure if I should run the
destroy-command already now to keep ceph from trying to reactivate the
broken osd?  Then I would wait until the disk has arrived in a day or so
and then use ceph volume to create a new osd?

If the rebalance is complete, then I would destroy the old OSD now - as 
you say, if the system reboots or somesuch you don't want the OSD to try 
and restart on a fail{ed,ing} disk.

Regards,

Matthew

--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx