Re: osd removal leaves 'stray daemon'

Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx> · Fri, 8 Nov 2024 09:24:18 +0100 (CET)

Hi,

I added some more logs to the bug tracker [1]. Could this be related to the 60s (hard coded) limit in def _check_for_strays(self) [2]?

Regards,
Frédéric.

[1] https://tracker.ceph.com/issues/67018
[2] https://github.com/ceph/ceph/blob/f55fc4599a6c0da0f4bd2f3ecd2122e603ad94dd/src/pybind/mgr/cephadm/serve.py#L475C4-L475C41

----- Le 7 Nov 24, à 16:28, Frédéric Nass frederic.nass@xxxxxxxxxxxxxxxx a écrit :

> Hi,
> 
> We're encountering this unexpected behavior as well. This tracker [1] was
> created 4 months ago.
> 
> Regards,
> Frédéric.
> 
> [1] https://tracker.ceph.com/issues/67018
> 
> ----- Le 6 Déc 22, à 8:41, Holger Naundorf naundorf@xxxxxxxxxxxxxx a écrit :
> 
>> Hello,
>> a mgr failover did not change the situation - the osd still shows up in
>> the 'ceph node ls' - I assume that this is more or less 'working as
>> intended' as I did ask for the OSD to be kept in the CRUSH map to be
>> replacd later - but as we are still not so experienced with Ceph here I
>> wanted to get some input from other sites.
>> 
>> Regards,
>> Holger
>> 
>> On 30.11.22 16:28, Adam King wrote:
>>> I typically don't see this when I do OSD replacement. If you do a mgr
>>> failover ("ceph mgr fail") and wait a few minutes does this still show
>>> up? The stray daemon/host warning is roughly equivalent to comparing the
>>> daemons in `ceph node ls` and `ceph orch ps` and seeing if there's
>>> anything in the former but not the latter. Sometimes I have seen the mgr
>>> will have some out of data info in the node ls and a failover will
>>> refresh it.
>>> 
>>> On Fri, Nov 25, 2022 at 6:07 AM Holger Naundorf <naundorf@xxxxxxxxxxxxxx
>>> <mailto:naundorf@xxxxxxxxxxxxxx>> wrote:
>>> 
>>>     Hello,
>>>     I have a question about osd removal/replacement:
>>> 
>>>     I just removed an osd where the disk was still running but had read
>>>     errors, leading to failed deep scrubs - as the intent is to replace
>>>     this
>>>     as soon as we manage to get a spare I removed it with the
>>>     '--replace' flag:
>>> 
>>>     # ceph orch osd rm 224 --replace
>>> 
>>>     After all placement groups are evacuated I now have 1 osd down/out
>>>     and showing as 'destroyed':
>>> 
>>>     # ceph osd tree
>>>     ID   CLASS  WEIGHT      TYPE NAME        STATUS     REWEIGHT  PRI-AFF
>>>     (...)
>>>     214    hdd    14.55269          osd.214         up   1.00000  1.00000
>>>     224    hdd    14.55269          osd.224  destroyed         0  1.00000
>>>     234    hdd    14.55269          osd.234         up   1.00000  1.00000
>>>     (...)
>>> 
>>>     All as expected - but now the health check complains that the
>>>     (destroyed) osd is not managed:
>>> 
>>>     # ceph health detail
>>>     HEALTH_WARN 1 stray daemon(s) not managed by cephadm
>>>     [WRN] CEPHADM_STRAY_DAEMON: 1 stray daemon(s) not managed by cephadm
>>>           stray daemon osd.224 on host ceph19 not managed by cephadm
>>> 
>>>     Is this expected behaviour and I have to live with the yellow check
>>>     until we get a replacement disk and recreate the osd or did something
>>>     not finish correctly?
>>> 
>>>     Regards,
>>>     Holger
>>> 
>>>     --
>>>     Dr. Holger Naundorf
>>>     Christian-Albrechts-Universität zu Kiel
>>>     Rechenzentrum / HPC / Server und Storage
>>>     Tel: +49 431 880-1990
>>>     Fax:  +49 431 880-1523
>>>     naundorf@xxxxxxxxxxxxxx <mailto:naundorf@xxxxxxxxxxxxxx>
>>>     _______________________________________________
>>>     ceph-users mailing list -- ceph-users@xxxxxxx
>>>     <mailto:ceph-users@xxxxxxx>
>>>     To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>     <mailto:ceph-users-leave@xxxxxxx>
>>> 
>> 
>> --
>> Dr. Holger Naundorf
>> Christian-Albrechts-Universität zu Kiel
>> Rechenzentrum / HPC / Server und Storage
>> Tel: +49 431 880-1990
>> Fax:  +49 431 880-1523
>> naundorf@xxxxxxxxxxxxxx
>> 
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx