Hi Tim, I see what you're referring to, but it doesn't apply since there's actually **no** stray daemons, that is **no** ghost process on any hosts trying to start. Here we're talking about unexpected behavior, most likely a bug. Regards, Frédéric. ----- Le 7 Nov 24, à 21:08, Tim Holloway timh@xxxxxxxxxxxxx a écrit : > You can get this sort of behaviour because different Ceph subsystems get > their information from different places instead of having an > authoritative source of information. > > Specifically, Ceph may look directly at: > > A) Its configuration database > > B) Systemd units running on the OSD host > > C) Containers running ceph modules. > > The problem is especially likely if you've managed to end up running the > same OSD number as both legacy (/var/lib/ceph/osd.x) and Manager > (/var/lib/ceph/{fsid}/ceph). > > If you have a dual-defined OSD, the cleanest approach seems to be to > stop ceph on the bad machine and manually delete the /var/lib/ceph/osd.x > directory. You may need to delete a systemd unit file for that OSD. > > You cannot delete the systemd unit for Managed OSDs. It's dynamically > created when the system comes up and will simply re-create itself. Which > is why it's easier to purge the artefacts of a legacy OSD. > > Tim > > > On 11/7/24 10:28, Frédéric Nass wrote: >> Hi, >> >> We're encountering this unexpected behavior as well. This tracker [1] was >> created 4 months ago. >> >> Regards, >> Frédéric. >> >> [1] https://tracker.ceph.com/issues/67018 >> >> ----- Le 6 Déc 22, à 8:41, Holger Naundorf naundorf@xxxxxxxxxxxxxx a écrit : >> >>> Hello, >>> a mgr failover did not change the situation - the osd still shows up in >>> the 'ceph node ls' - I assume that this is more or less 'working as >>> intended' as I did ask for the OSD to be kept in the CRUSH map to be >>> replacd later - but as we are still not so experienced with Ceph here I >>> wanted to get some input from other sites. >>> >>> Regards, >>> Holger >>> >>> On 30.11.22 16:28, Adam King wrote: >>>> I typically don't see this when I do OSD replacement. If you do a mgr >>>> failover ("ceph mgr fail") and wait a few minutes does this still show >>>> up? The stray daemon/host warning is roughly equivalent to comparing the >>>> daemons in `ceph node ls` and `ceph orch ps` and seeing if there's >>>> anything in the former but not the latter. Sometimes I have seen the mgr >>>> will have some out of data info in the node ls and a failover will >>>> refresh it. >>>> >>>> On Fri, Nov 25, 2022 at 6:07 AM Holger Naundorf <naundorf@xxxxxxxxxxxxxx >>>> <mailto:naundorf@xxxxxxxxxxxxxx>> wrote: >>>> >>>> Hello, >>>> I have a question about osd removal/replacement: >>>> >>>> I just removed an osd where the disk was still running but had read >>>> errors, leading to failed deep scrubs - as the intent is to replace >>>> this >>>> as soon as we manage to get a spare I removed it with the >>>> '--replace' flag: >>>> >>>> # ceph orch osd rm 224 --replace >>>> >>>> After all placement groups are evacuated I now have 1 osd down/out >>>> and showing as 'destroyed': >>>> >>>> # ceph osd tree >>>> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF >>>> (...) >>>> 214 hdd 14.55269 osd.214 up 1.00000 1.00000 >>>> 224 hdd 14.55269 osd.224 destroyed 0 1.00000 >>>> 234 hdd 14.55269 osd.234 up 1.00000 1.00000 >>>> (...) >>>> >>>> All as expected - but now the health check complains that the >>>> (destroyed) osd is not managed: >>>> >>>> # ceph health detail >>>> HEALTH_WARN 1 stray daemon(s) not managed by cephadm >>>> [WRN] CEPHADM_STRAY_DAEMON: 1 stray daemon(s) not managed by cephadm >>>> stray daemon osd.224 on host ceph19 not managed by cephadm >>>> >>>> Is this expected behaviour and I have to live with the yellow check >>>> until we get a replacement disk and recreate the osd or did something >>>> not finish correctly? >>>> >>>> Regards, >>>> Holger >>>> >>>> -- >>>> Dr. Holger Naundorf >>>> Christian-Albrechts-Universität zu Kiel >>>> Rechenzentrum / HPC / Server und Storage >>>> Tel: +49 431 880-1990 >>>> Fax: +49 431 880-1523 >>>> naundorf@xxxxxxxxxxxxxx <mailto:naundorf@xxxxxxxxxxxxxx> >>>> _______________________________________________ >>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>> <mailto:ceph-users@xxxxxxx> >>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>> <mailto:ceph-users-leave@xxxxxxx> >>>> >>> -- >>> Dr. Holger Naundorf >>> Christian-Albrechts-Universität zu Kiel >>> Rechenzentrum / HPC / Server und Storage >>> Tel: +49 431 880-1990 >>> Fax: +49 431 880-1523 >>> naundorf@xxxxxxxxxxxxxx >>> >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@xxxxxxx >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx