Re: osd removal leaves 'stray daemon'

Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx> · Thu, 7 Nov 2024 16:28:07 +0100 (CET)

Hi,

We're encountering this unexpected behavior as well. This tracker [1] was created 4 months ago.

Regards,
Frédéric.

[1] https://tracker.ceph.com/issues/67018

----- Le 6 Déc 22, à 8:41, Holger Naundorf naundorf@xxxxxxxxxxxxxx a écrit :

> Hello,
> a mgr failover did not change the situation - the osd still shows up in
> the 'ceph node ls' - I assume that this is more or less 'working as
> intended' as I did ask for the OSD to be kept in the CRUSH map to be
> replacd later - but as we are still not so experienced with Ceph here I
> wanted to get some input from other sites.
> 
> Regards,
> Holger
> 
> On 30.11.22 16:28, Adam King wrote:
>> I typically don't see this when I do OSD replacement. If you do a mgr
>> failover ("ceph mgr fail") and wait a few minutes does this still show
>> up? The stray daemon/host warning is roughly equivalent to comparing the
>> daemons in `ceph node ls` and `ceph orch ps` and seeing if there's
>> anything in the former but not the latter. Sometimes I have seen the mgr
>> will have some out of data info in the node ls and a failover will
>> refresh it.
>> 
>> On Fri, Nov 25, 2022 at 6:07 AM Holger Naundorf <naundorf@xxxxxxxxxxxxxx
>> <mailto:naundorf@xxxxxxxxxxxxxx>> wrote:
>> 
>>     Hello,
>>     I have a question about osd removal/replacement:
>> 
>>     I just removed an osd where the disk was still running but had read
>>     errors, leading to failed deep scrubs - as the intent is to replace
>>     this
>>     as soon as we manage to get a spare I removed it with the
>>     '--replace' flag:
>> 
>>     # ceph orch osd rm 224 --replace
>> 
>>     After all placement groups are evacuated I now have 1 osd down/out
>>     and showing as 'destroyed':
>> 
>>     # ceph osd tree
>>     ID   CLASS  WEIGHT      TYPE NAME        STATUS     REWEIGHT  PRI-AFF
>>     (...)
>>     214    hdd    14.55269          osd.214         up   1.00000  1.00000
>>     224    hdd    14.55269          osd.224  destroyed         0  1.00000
>>     234    hdd    14.55269          osd.234         up   1.00000  1.00000
>>     (...)
>> 
>>     All as expected - but now the health check complains that the
>>     (destroyed) osd is not managed:
>> 
>>     # ceph health detail
>>     HEALTH_WARN 1 stray daemon(s) not managed by cephadm
>>     [WRN] CEPHADM_STRAY_DAEMON: 1 stray daemon(s) not managed by cephadm
>>           stray daemon osd.224 on host ceph19 not managed by cephadm
>> 
>>     Is this expected behaviour and I have to live with the yellow check
>>     until we get a replacement disk and recreate the osd or did something
>>     not finish correctly?
>> 
>>     Regards,
>>     Holger
>> 
>>     --
>>     Dr. Holger Naundorf
>>     Christian-Albrechts-Universität zu Kiel
>>     Rechenzentrum / HPC / Server und Storage
>>     Tel: +49 431 880-1990
>>     Fax:  +49 431 880-1523
>>     naundorf@xxxxxxxxxxxxxx <mailto:naundorf@xxxxxxxxxxxxxx>
>>     _______________________________________________
>>     ceph-users mailing list -- ceph-users@xxxxxxx
>>     <mailto:ceph-users@xxxxxxx>
>>     To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>     <mailto:ceph-users-leave@xxxxxxx>
>> 
> 
> --
> Dr. Holger Naundorf
> Christian-Albrechts-Universität zu Kiel
> Rechenzentrum / HPC / Server und Storage
> Tel: +49 431 880-1990
> Fax:  +49 431 880-1523
> naundorf@xxxxxxxxxxxxxx
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx