On Thu, Jan 11, 2018 at 4:30 AM, Dietmar Rieder <dietmar.rieder@xxxxxxxxxxx> wrote: > Hello, > > we have failed OSD disk in our Luminous v12.2.2 cluster that needs to > get replaced. > > The cluster was initially deployed using ceph-deploy on Luminous > v12.2.0. The OSDs were created using > > ceph-deploy osd create --bluestore cephosd-${osd}:/dev/sd${disk} > --block-wal /dev/nvme0n1 --block-db /dev/nvme0n1 > > Note we separated the bluestore data, wal and db. > > We updated to Luminous v12.2.1 and further to Luminous v12.2.2. > > With the last update we also let ceph-volume take over the OSDs using > "ceph-volume simple scan /var/lib/ceph/osd/$osd" and "ceph-volume > simple activate ${osd} ${id}". All of this went smoothly. That is good to hear! > > Now wonder what is the correct way to replace a failed OSD block disk? > > The docs for luminous [1] say: > > REPLACING AN OSD > > 1. Destroy the OSD first: > > ceph osd destroy {id} --yes-i-really-mean-it > > 2. Zap a disk for the new OSD, if the disk was used before for other > purposes. It’s not necessary for a new disk: > > ceph-disk zap /dev/sdX > > > 3. Prepare the disk for replacement by using the previously destroyed > OSD id: > > ceph-disk prepare --bluestore /dev/sdX --osd-id {id} --osd-uuid `uuidgen` > > > 4. And activate the OSD: > > ceph-disk activate /dev/sdX1 > > > Initially this seems to be straight forward, but.... > > 1. I'm not sure if there is something to do with the still existing > bluefs db and wal partitions on the nvme device for the failed OSD. Do > they have to be zapped ? If yes, what is the best way? There is nothing > mentioned in the docs. What is your concern here if the activation seems to work? > > 2. Since we already let "ceph-volume simple" take over our OSDs I'm not > sure if we should now use ceph-volume or again ceph-disk (followed by > "ceph-vloume simple" takeover) to prepare and activate the OSD? The `simple` sub-command is meant to help with the activation of OSDs at boot time, supporting ceph-disk (or manual) created OSDs. There is no requirement to use `ceph-volume lvm` which is intended for new OSDs using LVM as devices. > > 3. If we should use ceph-volume, then by looking at the luminous > ceph-volume docs [2] I find for both, > > ceph-volume lvm prepare > ceph-volume lvm activate > > that the bluestore option is either NOT implemented or NOT supported > > activate: [–bluestore] filestore (IS THIS A TYPO???) objectstore (not > yet implemented) > prepare: [–bluestore] Use the bluestore objectstore (not currently > supported) These might be a typo on the man page, will get that addressed. Ticket opened at http://tracker.ceph.com/issues/22663 bluestore as of 12.2.2 is fully supported and it is the default. The --help output in ceph-volume does have the flags updated and correctly showing this. > > > So, now I'm completely lost. How is all of this fitting together in > order to replace a failed OSD? You would need to keep using ceph-disk. Unless you want ceph-volume to take over, in which case you would need to follow the steps to deploy a new OSD with ceph-volume. Note that although --osd-id is supported, there is an issue with that on 12.2.2 that would prevent you from correctly deploying it http://tracker.ceph.com/issues/22642 The recommendation, if you want to use ceph-volume, would be to omit --osd-id and let the cluster give you the ID. > > 4. More.... after reading some a recent threads on this list additional > questions are coming up: > > According to the OSD replacement doc [1] : > > "When disks fail, [...], OSDs need to be replaced. Unlike Removing the > OSD, replaced OSD’s id and CRUSH map entry need to be keep [TYPO HERE? > keep -> kept] intact after the OSD is destroyed for replacement." > > but > http://tracker.ceph.com/issues/22642 seems to say that it is not > possible to reuse am OSD's id That is a ceph-volume specific issue, unrelated to how replacement in Ceph works. > > > So I'm quite lost with an essential and very basic seemingly simple task > of storage management. You have two choices: 1) keep using ceph-disk as always, even though you have "ported" your OSDs with `ceph-volume simple` 2) Deploy new OSDs with ceph-volume For #1 you will want to keep running `simple` on newly deployed OSDs so that they can come up after a reboot, since `simple` disables the udev rules that caused activation with ceph-disk > > Thanks for any help here. > > ~Dietmar > > > [1]: http://docs.ceph.com/docs/luminous/rados/operations/add-or-rm-osds/ > [2]: http://docs.ceph.com/docs/luminous/man/8/ceph-volume/ > > -- > _________________________________________ > D i e t m a r R i e d e r, Mag.Dr. > Innsbruck Medical University > Biocenter - Division for Bioinformatics > Email: dietmar.rieder@xxxxxxxxxxx > Web: http://www.icbi.at > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com