Re: replace failed disk in Luminous v12.2.2

Dietmar Rieder <dietmar.rieder@xxxxxxxxxxx> · Thu, 18 Jan 2018 11:27:45 +0100

Hi,

I finally found a working way to replace the failed OSD. Everthing looks
fine again.

Thanks again for your comments and suggestions.

Dietmar

On 01/12/2018 04:08 PM, Dietmar Rieder wrote:
> Hi,
> 
> can someone, comment/confirm my planned OSD replacement procedure?
> 
> It would be very helpful for me.
> 
> Dietmar
> 
> Am 11. Januar 2018 17:47:50 MEZ schrieb Dietmar Rieder
> <dietmar.rieder@xxxxxxxxxxx>:
> 
>     Hi Alfredo,
> 
>     thanks for your coments, see my answers inline.
> 
>     On 01/11/2018 01:47 PM, Alfredo Deza wrote:
> 
>         On Thu, Jan 11, 2018 at 4:30 AM, Dietmar Rieder
>         <dietmar.rieder@xxxxxxxxxxx> wrote:
> 
>             Hello,
> 
>             we have failed OSD disk in our Luminous v12.2.2 cluster that
>             needs to
>             get replaced.
> 
>             The cluster was initially deployed using ceph-deploy on Luminous
>             v12.2.0. The OSDs were created using
> 
>             ceph-deploy osd create --bluestore cephosd-${osd}:/dev/sd${disk}
>             --block-wal /dev/nvme0n1 --block-db /dev/nvme0n1
> 
>             Note we separated the bluestore data, wal and db.
> 
>             We updated to Luminous v12.2.1 and further to Luminous v12.2.2.
> 
>             With the last update we also let ceph-volume take over the
>             OSDs using
>             "ceph-volume simple scan /var/lib/ceph/osd/$osd" and
>             "ceph-volume
>             simple activate ${osd} ${id}". All of this went smoothly.
> 
> 
>         That is good to hear!
> 
> 
>             Now wonder what is the correct way to replace a failed OSD
>             block disk?
> 
>             The docs for luminous [1] say:
> 
>             REPLACING AN OSD
> 
>             1. Destroy the OSD first:
> 
>             ceph osd destroy {id} --yes-i-really-mean-it
> 
>             2. Zap a disk for the new OSD, if the disk was used before
>             for other
>             purposes. It’s not necessary for a new disk:
> 
>             ceph-disk zap /dev/sdX
> 
> 
>             3. Prepare the disk for replacement by using the previously
>             destroyed
>             OSD id:
> 
>             ceph-disk prepare --bluestore /dev/sdX --osd-id {id}
>             --osd-uuid `uuidgen`
> 
> 
>             4. And activate the OSD:
> 
>             ceph-disk activate /dev/sdX1
> 
> 
>             Initially this seems to be straight forward, but....
> 
>             1. I'm not sure if there is something to do with the still
>             existing
>             bluefs db and wal partitions on the nvme device for the
>             failed OSD. Do
>             they have to be zapped ? If yes, what is the best way? There
>             is nothing
>             mentioned in the docs.
> 
> 
>         What is your concern here if the activation seems to work?
> 
> 
>     I geuss on the nvme partitions for bluefs db and bluefs wal there is
>     still data related to the failed OSD  block device. I was thinking that
>     this data might "interfere" with the new replacement OSD block device,
>     which is empty.
> 
>     So you are saying that this is no concern, right?
>     Are they automatically reused and assigned to the replacement OSD block
>     device, or do I have to specify them when running ceph-disk prepare?
>     If I need to specify the wal and db partition, how is this done?
> 
>     I'm asking this since from the logs of the initial cluster deployment I
>     got the following warning:
> 
>     [cephosd-02][WARNING] prepare_device: OSD will not be hot-swappable if
>     block.db is not the same device as the osd data
>     [...]
>     [cephosd-02][WARNING] prepare_device: OSD will not be hot-swappable if
>     block.wal is not the same device as the osd data
> 
> 
> 
>             2. Since we already let "ceph-volume simple" take over our
>             OSDs I'm not
>             sure if we should now use ceph-volume or again ceph-disk
>             (followed by
>             "ceph-vloume simple" takeover) to prepare and activate the OSD?
> 
> 
>         The `simple` sub-command is meant to help with the activation of
>         OSDs
>         at boot time, supporting ceph-disk (or manual) created OSDs.
> 
> 
>     OK, got this...
> 
> 
>         There is no requirement to use `ceph-volume lvm` which is
>         intended for
>         new OSDs using LVM as devices.
> 
> 
>     Fine...
> 
> 
>             3. If we should use ceph-volume, then by looking at the luminous
>             ceph-volume docs [2] I find for both,
> 
>             ceph-volume lvm prepare
>             ceph-volume lvm activate
> 
>             that the bluestore option is either NOT implemented or NOT
>             supported
> 
>             activate: [–bluestore] filestore (IS THIS A TYPO???)
>             objectstore (not
>             yet implemented)
>             prepare: [–bluestore] Use the bluestore objectstore (not
>             currently
>             supported)
> 
> 
>         These might be a typo on the man page, will get that addressed.
>         Ticket
>         opened at http://tracker.ceph.com/issues/22663
> 
> 
>     Thanks
> 
>         bluestore as of 12.2.2 is fully supported and it is the default. The
>         --help output in ceph-volume does have the flags updated and
>         correctly
>         showing this.
> 
> 
>     OK
> 
> 
> 
>             So, now I'm completely lost. How is all of this fitting
>             together in
>             order to replace a failed OSD?
> 
> 
>         You would need to keep using ceph-disk. Unless you want
>         ceph-volume to
>         take over, in which case you would need to follow the steps to
>         deploy
>         a new OSD
>         with ceph-volume.
> 
> 
>     OK
> 
>         Note that although --osd-id is supported, there is an issue with
>         that
>         on 12.2.2 that would prevent you from correctly deploying it
>         http://tracker.ceph.com/issues/22642
> 
>         The recommendation, if you want to use ceph-volume, would be to omit
>         --osd-id and let the cluster give you the ID.
> 
> 
>             4. More.... after reading some a recent threads on this list
>             additional
>             questions are coming up:
> 
>             According to the OSD replacement doc [1] :
> 
>             "When disks fail, [...], OSDs need to be replaced. Unlike
>             Removing the
>             OSD, replaced OSD’s id and CRUSH map entry need to be keep
>             [TYPO HERE?
>             keep -> kept] intact after the OSD is destroyed for
>             replacement."
> 
>             but
>             http://tracker.ceph.com/issues/22642 seems to say that it is not
>             possible to reuse am OSD's id
> 
> 
>         That is a ceph-volume specific issue, unrelated to how
>         replacement in
>         Ceph works.
> 
> 
>     OK
> 
> 
> 
>             So I'm quite lost with an essential and very basic seemingly
>             simple task
>             of storage management.
> 
> 
>         You have two choices:
> 
>         1) keep using ceph-disk as always, even though you have "ported"
>         your
>         OSDs with `ceph-volume simple`
>         2) Deploy new OSDs with ceph-volume
> 
>         For #1 you will want to keep running `simple` on newly deployed OSDs
>         so that they can come up after a reboot, since `simple` disables the
>         udev rules
>         that caused activation with ceph-disk
> 
> 
>     OK, thanks so much for clarifying these thinks. I'll go for the
>     ceph-disk option then.
> 
>     Just to be sure, these would be the steps I would do:
> 
>     1.
>     ceph osd destroy osd.33 --yes-i-really-mean-it
> 
>     2.
>     remove the failed HDD and replace it with a new HDD
> 
>     3.
>     ceph-disk prepare --bluestore /dev/sdo  --osd-id osd.33
> 
>     OR
> 
>     do I need to specify the wal and db partitions on the nvme here like
>     Konstantin was suggesting in his answer to my question:
> 
>     3.1. Find nvme partition for this OSD using ceph-disk, which gives me:
> 
>     /dev/nvme1n1p2 ceph block.db
>     /dev/nvme1n1p3 ceph block.wal
> 
>     3.2. Delete partition via parted or fdisk.
> 
>     fdisk -u /dev/nvme1n1
>     d (delete partitions)
>     enter partition number of block.db: 2
>     d
>     enter partition number of block.wal: 3
>     w (write partition table)
> 
>     3.3. run ceph-disk prepare
> 
>     ceph-disk -v prepare --block.wal /dev/nvme1n1 --block.db /dev/nvme1n1 \
>     --bluestore /dev/sdo --osd-id osd.33
> 
>     4.
>     Do I need to run "ceph-disk activate"?
> 
>     ceph-disk activate /dev/sdo1
> 
>     or any of the "ceph-volume simple" commands now?
> 
>     or just start the osd with systemctl?
> 
>     Thanks so much, and sorry for my igonrance ;-)
> 
>     ~Best
>        Dietmar
> 
> 
> -- 
> Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail
> gesendet.
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
_________________________________________
D i e t m a r  R i e d e r, Mag.Dr.
Innsbruck Medical University
Biocenter - Division for Bioinformatics
Innrain 80, 6020 Innsbruck
Email: dietmar.rieder@xxxxxxxxxxx
Web:   http://www.icbi.at

Attachment:
signature.asc

Description: OpenPGP digital signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com