Hi, I added some more logs to the bug tracker [1]. Could this be related to the 60s (hard coded) limit in def _check_for_strays(self) [2]? Regards, Frédéric. [1] https://tracker.ceph.com/issues/67018 [2] https://github.com/ceph/ceph/blob/f55fc4599a6c0da0f4bd2f3ecd2122e603ad94dd/src/pybind/mgr/cephadm/serve.py#L475C4-L475C41 ----- Le 7 Nov 24, à 16:28, Frédéric Nass frederic.nass@xxxxxxxxxxxxxxxx a écrit : > Hi, > > We're encountering this unexpected behavior as well. This tracker [1] was > created 4 months ago. > > Regards, > Frédéric. > > [1] https://tracker.ceph.com/issues/67018 > > ----- Le 6 Déc 22, à 8:41, Holger Naundorf naundorf@xxxxxxxxxxxxxx a écrit : > >> Hello, >> a mgr failover did not change the situation - the osd still shows up in >> the 'ceph node ls' - I assume that this is more or less 'working as >> intended' as I did ask for the OSD to be kept in the CRUSH map to be >> replacd later - but as we are still not so experienced with Ceph here I >> wanted to get some input from other sites. >> >> Regards, >> Holger >> >> On 30.11.22 16:28, Adam King wrote: >>> I typically don't see this when I do OSD replacement. If you do a mgr >>> failover ("ceph mgr fail") and wait a few minutes does this still show >>> up? The stray daemon/host warning is roughly equivalent to comparing the >>> daemons in `ceph node ls` and `ceph orch ps` and seeing if there's >>> anything in the former but not the latter. Sometimes I have seen the mgr >>> will have some out of data info in the node ls and a failover will >>> refresh it. >>> >>> On Fri, Nov 25, 2022 at 6:07 AM Holger Naundorf <naundorf@xxxxxxxxxxxxxx >>> <mailto:naundorf@xxxxxxxxxxxxxx>> wrote: >>> >>> Hello, >>> I have a question about osd removal/replacement: >>> >>> I just removed an osd where the disk was still running but had read >>> errors, leading to failed deep scrubs - as the intent is to replace >>> this >>> as soon as we manage to get a spare I removed it with the >>> '--replace' flag: >>> >>> # ceph orch osd rm 224 --replace >>> >>> After all placement groups are evacuated I now have 1 osd down/out >>> and showing as 'destroyed': >>> >>> # ceph osd tree >>> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF >>> (...) >>> 214 hdd 14.55269 osd.214 up 1.00000 1.00000 >>> 224 hdd 14.55269 osd.224 destroyed 0 1.00000 >>> 234 hdd 14.55269 osd.234 up 1.00000 1.00000 >>> (...) >>> >>> All as expected - but now the health check complains that the >>> (destroyed) osd is not managed: >>> >>> # ceph health detail >>> HEALTH_WARN 1 stray daemon(s) not managed by cephadm >>> [WRN] CEPHADM_STRAY_DAEMON: 1 stray daemon(s) not managed by cephadm >>> stray daemon osd.224 on host ceph19 not managed by cephadm >>> >>> Is this expected behaviour and I have to live with the yellow check >>> until we get a replacement disk and recreate the osd or did something >>> not finish correctly? >>> >>> Regards, >>> Holger >>> >>> -- >>> Dr. Holger Naundorf >>> Christian-Albrechts-Universität zu Kiel >>> Rechenzentrum / HPC / Server und Storage >>> Tel: +49 431 880-1990 >>> Fax: +49 431 880-1523 >>> naundorf@xxxxxxxxxxxxxx <mailto:naundorf@xxxxxxxxxxxxxx> >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@xxxxxxx >>> <mailto:ceph-users@xxxxxxx> >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>> <mailto:ceph-users-leave@xxxxxxx> >>> >> >> -- >> Dr. Holger Naundorf >> Christian-Albrechts-Universität zu Kiel >> Rechenzentrum / HPC / Server und Storage >> Tel: +49 431 880-1990 >> Fax: +49 431 880-1523 >> naundorf@xxxxxxxxxxxxxx >> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx