Re: replace failed disk in Luminous v12.2.2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 11, 2018 at 4:30 AM, Dietmar Rieder
<dietmar.rieder@xxxxxxxxxxx> wrote:
> Hello,
>
> we have failed OSD disk in our Luminous v12.2.2 cluster that needs to
> get replaced.
>
> The cluster was initially deployed using ceph-deploy on Luminous
> v12.2.0. The OSDs were created using
>
> ceph-deploy osd create --bluestore cephosd-${osd}:/dev/sd${disk}
> --block-wal /dev/nvme0n1 --block-db /dev/nvme0n1
>
> Note we separated the bluestore data, wal and db.
>
> We updated to Luminous v12.2.1 and further to Luminous v12.2.2.
>
> With the last update we also let ceph-volume take over the OSDs using
> "ceph-volume simple scan  /var/lib/ceph/osd/$osd" and "ceph-volume
> simple activate ${osd} ${id}". All of this went smoothly.

That is good to hear!

>
> Now wonder what is the correct way to replace a failed OSD block disk?
>
> The docs for luminous [1] say:
>
> REPLACING AN OSD
>
> 1. Destroy the OSD first:
>
> ceph osd destroy {id} --yes-i-really-mean-it
>
> 2. Zap a disk for the new OSD, if the disk was used before for other
> purposes. It’s not necessary for a new disk:
>
> ceph-disk zap /dev/sdX
>
>
> 3. Prepare the disk for replacement by using the previously destroyed
> OSD id:
>
> ceph-disk prepare --bluestore /dev/sdX  --osd-id {id} --osd-uuid `uuidgen`
>
>
> 4. And activate the OSD:
>
> ceph-disk activate /dev/sdX1
>
>
> Initially this seems to be straight forward, but....
>
> 1. I'm not sure if there is something to do with the still existing
> bluefs db and wal partitions on the nvme device for the failed OSD. Do
> they have to be zapped ? If yes, what is the best way? There is nothing
> mentioned in the docs.

What is your concern here if the activation seems to work?

>
> 2. Since we already let "ceph-volume simple" take over our OSDs I'm not
> sure if we should now use ceph-volume or again ceph-disk (followed by
> "ceph-vloume simple" takeover) to prepare and activate the OSD?

The `simple` sub-command is meant to help with the activation of OSDs
at boot time, supporting ceph-disk (or manual) created OSDs.

There is no requirement to use `ceph-volume lvm` which is intended for
new OSDs using LVM as devices.

>
> 3. If we should use ceph-volume, then by looking at the luminous
> ceph-volume docs [2] I find for both,
>
> ceph-volume lvm prepare
> ceph-volume lvm activate
>
> that the bluestore option is either NOT implemented or NOT supported
>
> activate:  [–bluestore] filestore (IS THIS A TYPO???) objectstore (not
> yet implemented)
> prepare: [–bluestore] Use the bluestore objectstore (not currently
> supported)

These might be a typo on the man page, will get that addressed. Ticket
opened at http://tracker.ceph.com/issues/22663

bluestore as of 12.2.2 is fully supported and it is the default. The
--help output in ceph-volume does have the flags updated and correctly
showing this.

>
>
> So, now I'm completely lost. How is all of this fitting together in
> order to replace a failed OSD?

You would need to keep using ceph-disk. Unless you want ceph-volume to
take over, in which case you would need to follow the steps to deploy
a new OSD
with ceph-volume.

Note that although --osd-id is supported, there is an issue with that
on 12.2.2 that would prevent you from correctly deploying it
http://tracker.ceph.com/issues/22642

The recommendation, if you want to use ceph-volume, would be to omit
--osd-id and let the cluster give you the ID.

>
> 4. More.... after reading some a recent threads on this list additional
> questions are coming up:
>
> According to the OSD replacement doc [1] :
>
> "When disks fail, [...], OSDs need to be replaced. Unlike Removing the
> OSD, replaced OSD’s id and CRUSH map entry need to be keep [TYPO HERE?
> keep -> kept] intact after the OSD is destroyed for replacement."
>
> but
> http://tracker.ceph.com/issues/22642 seems to say that it is not
> possible to reuse am OSD's id

That is a ceph-volume specific issue, unrelated to how replacement in
Ceph works.

>
>
> So I'm quite lost with an essential and very basic seemingly simple task
> of storage management.

You have two choices:

1) keep using ceph-disk as always, even though you have "ported" your
OSDs with `ceph-volume simple`
2) Deploy new OSDs with ceph-volume

For #1 you will want to keep running `simple` on newly deployed OSDs
so that they can come up after a reboot, since `simple` disables the
udev rules
that caused activation with ceph-disk

>
> Thanks for any help here.
>
> ~Dietmar
>
>
> [1]: http://docs.ceph.com/docs/luminous/rados/operations/add-or-rm-osds/
> [2]: http://docs.ceph.com/docs/luminous/man/8/ceph-volume/
>
> --
> _________________________________________
> D i e t m a r  R i e d e r, Mag.Dr.
> Innsbruck Medical University
> Biocenter - Division for Bioinformatics
> Email: dietmar.rieder@xxxxxxxxxxx
> Web:   http://www.icbi.at
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux