Re: Accidentally created systemd units for OSDs

Tim Holloway <timh@xxxxxxxxxxxxx> · Fri, 16 Aug 2024 12:03:38 -0400

Been there/did that. Cried a lot. Fixed now.

Personally, I recommend the containerise/cephadm-managed approach. In a
lot of ways, it's simpler and it supports more than one fsid on a
single host.The downside is that the systemd names are really gnarly
(the full fsid is part of the unitname) and they're generated on-the-
fly so you won't find the unitnames in any of the systemd static
directories (/usr/lib/systemd/systemd and /etc/systemd/system).

As I recall, having 2 different managers for an OSD didn't hurt overall
operation that much, but it did make status reporting somewhat flakey.
Apparently different tools check OSDs differently.

The other problem is that the traditional (non-container) OSD
definitions are located under /var.lib/ceph, but the containerized OSDs
are under /var/lib/ceph/<fsid>. There are some softlinks involved so
that, for example, the OSD data itself may be shared between the two
OSD controllers, and that has the potential for conflict.

I think that the safest way to handle that is to completely nuke that
OSD in both its forms, then re-create it according to your mode of
choice. That can take a long time, but shortcuts may make things worse.

Pity that Ceph can't detect and prevent that sort of stuff, but such is
life.

   Tim

On Fri, 2024-08-16 at 15:32 +0000, Dan O'Brien wrote:
> I was [poorly] following the instructions for migrating the wal/db to
> an SSD
> 
> https://docs.clyso.com/blog/ceph-volume-create-wal-db-on-separate-device-for-existing-osd/
> 
> and I didn't add the '--no-systemd' when I did 'ceph-volume lvm
> activate' command (3 f***ing times). The result is that I've
> "twinned" 3 of my OSDs: There's a container version managed by
> cephadm and there's an instantiated systemd unit that runs directly.
> Surprisingly, this has not done a lot of damage, but it does result
> in the dashboard reporting 3 failed cephadm daemons when the "native"
> OSDs start before the containerized ones.
> 
> I've disabled the systemd units for ceph-osd@9.service,
> ceph-osd@11.service and ceph-osd@25.service, but I'd like to remove
> them completely. I will eventually badger The Google into giving me
> an answer, but could someone tell me what I need to do? The semester
> starts soon and I don't really have the bandwidth for this right now.
> 
> Thanks in advance. I will forever be in your debt. (Seriously, I'm
> ready to give you a kidney, if you need it.)
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx