Re: Replace block drives of combined NVME+HDD OSDs

Eugen Block <eblock@xxxxxx> · Tue, 02 Apr 2024 16:36:54 +0000

Nice, thanks for the info.

Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx>:

Thank you, Eugen.

It was actually very straightforward. I'm happy to report back that there
were no issues with removing and zapping the OSDs whose data devices were
unavailable. I had to manually remove stale dm entries, but that was it.

/Z

On Tue, 2 Apr 2024 at 11:00, Eugen Block <eblock@xxxxxx> wrote:

Hi,

here's the link to the docs [1] how to replace OSDs.

ceph orch osd rm <OSD_ID> --replace --zap [--force]

This should zap both the data drive and db LV (yes, its data is
useless without the data drive), not sure how it will handle if the
data drive isn't accessible though.
One thing I'm not sure about is how your spec file will be handled.
Since the drive letters can change I recommend to use a more generic
approach, for example the rotational flags and drive sizes instead of
paths. But if the drive letters won't change for the replaced drives
it should work. I also don't expect an impact on the rest of the OSDs
(except for backfilling, of course).

Regards,
Eugen

[1] https://docs.ceph.com/en/latest/cephadm/services/osd/#replacing-an-osd

Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx>:

> Hi,
>
> Unfortunately, some of our HDDs failed and we need to replace these
drives
> which are parts of "combined" OSDs (DB/WAL on NVME, block storage on
HDD).
> All OSDs are defined with a service definition similar to this one:
>
> ```
> service_type: osd
> service_id: ceph02_combined_osd
> service_name: osd.ceph02_combined_osd
> placement:
>   hosts:
>   - ceph02
> spec:
>   data_devices:
>     paths:
>     - /dev/sda
>     - /dev/sdb
>     - /dev/sdc
>     - /dev/sdd
>     - /dev/sde
>     - /dev/sdf
>     - /dev/sdg
>     - /dev/sdh
>     - /dev/sdi
>   db_devices:
>     paths:
>     - /dev/nvme0n1
>     - /dev/nvme1n1
>   filter_logic: AND
>   objectstore: bluestore
> ```
>
> In the above example, HDDs `sda` and `sdb` are not readable and data
cannot
> be copied over to new HDDs. NVME partitions of `nvme0n1` with DB/WAL data
> are intact, but I guess that data is useless. I think the best approach
is
> to replace the dead drives and completely rebuild each affected OSD. How
> should we go about this, preferably in a way that other OSDs on the node
> remain unaffected and operational?
>
> I would appreciate any advice or pointers to the relevant documentation.
>
> Best regards,
> Zakhar
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx