Re: Replace block drives of combined NVME+HDD OSDs

Zakhar Kirpichenko <zakhar@xxxxxxxxx> · Tue, 2 Apr 2024 16:12:20 +0300

Thank you, Eugen.

It was actually very straightforward. I'm happy to report back that there
were no issues with removing and zapping the OSDs whose data devices were
unavailable. I had to manually remove stale dm entries, but that was it.

/Z

On Tue, 2 Apr 2024 at 11:00, Eugen Block <eblock@xxxxxx> wrote:

> Hi,
>
> here's the link to the docs [1] how to replace OSDs.
>
> ceph orch osd rm <OSD_ID> --replace --zap [--force]
>
> This should zap both the data drive and db LV (yes, its data is
> useless without the data drive), not sure how it will handle if the
> data drive isn't accessible though.
> One thing I'm not sure about is how your spec file will be handled.
> Since the drive letters can change I recommend to use a more generic
> approach, for example the rotational flags and drive sizes instead of
> paths. But if the drive letters won't change for the replaced drives
> it should work. I also don't expect an impact on the rest of the OSDs
> (except for backfilling, of course).
>
> Regards,
> Eugen
>
> [1] https://docs.ceph.com/en/latest/cephadm/services/osd/#replacing-an-osd
>
> Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx>:
>
> > Hi,
> >
> > Unfortunately, some of our HDDs failed and we need to replace these
> drives
> > which are parts of "combined" OSDs (DB/WAL on NVME, block storage on
> HDD).
> > All OSDs are defined with a service definition similar to this one:
> >
> > ```
> > service_type: osd
> > service_id: ceph02_combined_osd
> > service_name: osd.ceph02_combined_osd
> > placement:
> >   hosts:
> >   - ceph02
> > spec:
> >   data_devices:
> >     paths:
> >     - /dev/sda
> >     - /dev/sdb
> >     - /dev/sdc
> >     - /dev/sdd
> >     - /dev/sde
> >     - /dev/sdf
> >     - /dev/sdg
> >     - /dev/sdh
> >     - /dev/sdi
> >   db_devices:
> >     paths:
> >     - /dev/nvme0n1
> >     - /dev/nvme1n1
> >   filter_logic: AND
> >   objectstore: bluestore
> > ```
> >
> > In the above example, HDDs `sda` and `sdb` are not readable and data
> cannot
> > be copied over to new HDDs. NVME partitions of `nvme0n1` with DB/WAL data
> > are intact, but I guess that data is useless. I think the best approach
> is
> > to replace the dead drives and completely rebuild each affected OSD. How
> > should we go about this, preferably in a way that other OSDs on the node
> > remain unaffected and operational?
> >
> > I would appreciate any advice or pointers to the relevant documentation.
> >
> > Best regards,
> > Zakhar
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx