Hi, I finally found a working way to replace the failed OSD. Everthing looks fine again. Thanks again for your comments and suggestions. Dietmar On 01/12/2018 04:08 PM, Dietmar Rieder wrote: > Hi, > > can someone, comment/confirm my planned OSD replacement procedure? > > It would be very helpful for me. > > Dietmar > > Am 11. Januar 2018 17:47:50 MEZ schrieb Dietmar Rieder > <dietmar.rieder@xxxxxxxxxxx>: > > Hi Alfredo, > > thanks for your coments, see my answers inline. > > On 01/11/2018 01:47 PM, Alfredo Deza wrote: > > On Thu, Jan 11, 2018 at 4:30 AM, Dietmar Rieder > <dietmar.rieder@xxxxxxxxxxx> wrote: > > Hello, > > we have failed OSD disk in our Luminous v12.2.2 cluster that > needs to > get replaced. > > The cluster was initially deployed using ceph-deploy on Luminous > v12.2.0. The OSDs were created using > > ceph-deploy osd create --bluestore cephosd-${osd}:/dev/sd${disk} > --block-wal /dev/nvme0n1 --block-db /dev/nvme0n1 > > Note we separated the bluestore data, wal and db. > > We updated to Luminous v12.2.1 and further to Luminous v12.2.2. > > With the last update we also let ceph-volume take over the > OSDs using > "ceph-volume simple scan /var/lib/ceph/osd/$osd" and > "ceph-volume > simple activate ${osd} ${id}". All of this went smoothly. > > > That is good to hear! > > > Now wonder what is the correct way to replace a failed OSD > block disk? > > The docs for luminous [1] say: > > REPLACING AN OSD > > 1. Destroy the OSD first: > > ceph osd destroy {id} --yes-i-really-mean-it > > 2. Zap a disk for the new OSD, if the disk was used before > for other > purposes. It’s not necessary for a new disk: > > ceph-disk zap /dev/sdX > > > 3. Prepare the disk for replacement by using the previously > destroyed > OSD id: > > ceph-disk prepare --bluestore /dev/sdX --osd-id {id} > --osd-uuid `uuidgen` > > > 4. And activate the OSD: > > ceph-disk activate /dev/sdX1 > > > Initially this seems to be straight forward, but.... > > 1. I'm not sure if there is something to do with the still > existing > bluefs db and wal partitions on the nvme device for the > failed OSD. Do > they have to be zapped ? If yes, what is the best way? There > is nothing > mentioned in the docs. > > > What is your concern here if the activation seems to work? > > > I geuss on the nvme partitions for bluefs db and bluefs wal there is > still data related to the failed OSD block device. I was thinking that > this data might "interfere" with the new replacement OSD block device, > which is empty. > > So you are saying that this is no concern, right? > Are they automatically reused and assigned to the replacement OSD block > device, or do I have to specify them when running ceph-disk prepare? > If I need to specify the wal and db partition, how is this done? > > I'm asking this since from the logs of the initial cluster deployment I > got the following warning: > > [cephosd-02][WARNING] prepare_device: OSD will not be hot-swappable if > block.db is not the same device as the osd data > [...] > [cephosd-02][WARNING] prepare_device: OSD will not be hot-swappable if > block.wal is not the same device as the osd data > > > > 2. Since we already let "ceph-volume simple" take over our > OSDs I'm not > sure if we should now use ceph-volume or again ceph-disk > (followed by > "ceph-vloume simple" takeover) to prepare and activate the OSD? > > > The `simple` sub-command is meant to help with the activation of > OSDs > at boot time, supporting ceph-disk (or manual) created OSDs. > > > OK, got this... > > > There is no requirement to use `ceph-volume lvm` which is > intended for > new OSDs using LVM as devices. > > > Fine... > > > 3. If we should use ceph-volume, then by looking at the luminous > ceph-volume docs [2] I find for both, > > ceph-volume lvm prepare > ceph-volume lvm activate > > that the bluestore option is either NOT implemented or NOT > supported > > activate: [–bluestore] filestore (IS THIS A TYPO???) > objectstore (not > yet implemented) > prepare: [–bluestore] Use the bluestore objectstore (not > currently > supported) > > > These might be a typo on the man page, will get that addressed. > Ticket > opened at http://tracker.ceph.com/issues/22663 > > > Thanks > > bluestore as of 12.2.2 is fully supported and it is the default. The > --help output in ceph-volume does have the flags updated and > correctly > showing this. > > > OK > > > > So, now I'm completely lost. How is all of this fitting > together in > order to replace a failed OSD? > > > You would need to keep using ceph-disk. Unless you want > ceph-volume to > take over, in which case you would need to follow the steps to > deploy > a new OSD > with ceph-volume. > > > OK > > Note that although --osd-id is supported, there is an issue with > that > on 12.2.2 that would prevent you from correctly deploying it > http://tracker.ceph.com/issues/22642 > > The recommendation, if you want to use ceph-volume, would be to omit > --osd-id and let the cluster give you the ID. > > > 4. More.... after reading some a recent threads on this list > additional > questions are coming up: > > According to the OSD replacement doc [1] : > > "When disks fail, [...], OSDs need to be replaced. Unlike > Removing the > OSD, replaced OSD’s id and CRUSH map entry need to be keep > [TYPO HERE? > keep -> kept] intact after the OSD is destroyed for > replacement." > > but > http://tracker.ceph.com/issues/22642 seems to say that it is not > possible to reuse am OSD's id > > > That is a ceph-volume specific issue, unrelated to how > replacement in > Ceph works. > > > OK > > > > So I'm quite lost with an essential and very basic seemingly > simple task > of storage management. > > > You have two choices: > > 1) keep using ceph-disk as always, even though you have "ported" > your > OSDs with `ceph-volume simple` > 2) Deploy new OSDs with ceph-volume > > For #1 you will want to keep running `simple` on newly deployed OSDs > so that they can come up after a reboot, since `simple` disables the > udev rules > that caused activation with ceph-disk > > > OK, thanks so much for clarifying these thinks. I'll go for the > ceph-disk option then. > > Just to be sure, these would be the steps I would do: > > 1. > ceph osd destroy osd.33 --yes-i-really-mean-it > > 2. > remove the failed HDD and replace it with a new HDD > > 3. > ceph-disk prepare --bluestore /dev/sdo --osd-id osd.33 > > OR > > do I need to specify the wal and db partitions on the nvme here like > Konstantin was suggesting in his answer to my question: > > 3.1. Find nvme partition for this OSD using ceph-disk, which gives me: > > /dev/nvme1n1p2 ceph block.db > /dev/nvme1n1p3 ceph block.wal > > 3.2. Delete partition via parted or fdisk. > > fdisk -u /dev/nvme1n1 > d (delete partitions) > enter partition number of block.db: 2 > d > enter partition number of block.wal: 3 > w (write partition table) > > 3.3. run ceph-disk prepare > > ceph-disk -v prepare --block.wal /dev/nvme1n1 --block.db /dev/nvme1n1 \ > --bluestore /dev/sdo --osd-id osd.33 > > 4. > Do I need to run "ceph-disk activate"? > > ceph-disk activate /dev/sdo1 > > or any of the "ceph-volume simple" commands now? > > or just start the osd with systemctl? > > Thanks so much, and sorry for my igonrance ;-) > > ~Best > Dietmar > > > -- > Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail > gesendet. > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- _________________________________________ D i e t m a r R i e d e r, Mag.Dr. Innsbruck Medical University Biocenter - Division for Bioinformatics Innrain 80, 6020 Innsbruck Email: dietmar.rieder@xxxxxxxxxxx Web: http://www.icbi.at
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com