Re: CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)

Jeremy Hansen <jeremy@xxxxxxxxxx> · Mon, 7 Jun 2021 03:16:51 -0700

cephadm rm-daemon --name osd.29

on the node with the stale daemon did the trick.

-jeremy

> On Jun 7, 2021, at 2:24 AM, Jeremy Hansen <jeremy@xxxxxxxxxx> wrote:
> 
> Signed PGP part
> So I found the failed daemon:
> 
> [root@cn05 ~]# systemctl  | grep 29
> 
> ● ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d@osd.29.service                                                                  loaded failed failed    Ceph osd.29 for bfa2ad58-c049-11eb-9098-3c8cf8ed728d
> 
> But I’ve already replaced this osd, so this is perhaps left over from a previous osd.29 on this host.  How would I go about removing this cleanly and more important, in a way that Ceph is aware of the change, therefore clearing the warning.
> 
> Thanks
> -jeremy
> 
> 
>> On Jun 7, 2021, at 1:54 AM, Jeremy Hansen <jeremy@xxxxxxxxxx> wrote:
>> 
>> Signed PGP part
>> Thank you.  So I see this:
>> 
>> 2021-06-07T08:41:24.133493+0000 mgr.cn01.ceph.la1.clx.corp.xnkoft (mgr.224161) 1494 : cephadm [INF] Reconfiguring osd.29 (monmap changed)...
>> 2021-06-07T08:44:37.650022+0000 mgr.cn01.ceph.la1.clx.corp.xnkoft (mgr.224161) 1592 : cephadm [INF] Reconfiguring osd.29 (monmap changed)...
>> 2021-06-07T08:47:07.039405+0000 mgr.cn01.ceph.la1.clx.corp.xnkoft (mgr.224161) 1667 : cephadm [INF] Reconfiguring osd.29 (monmap changed)...
>> 2021-06-07T08:51:00.094847+0000 mgr.cn01.ceph.la1.clx.corp.xnkoft (mgr.224161) 1785 : cephadm [INF] Reconfiguring osd.29 (monmap changed)…
>> 
>> Yet…
>> 
>> ceph osd ls
>> 0
>> 1
>> 2
>> 3
>> 4
>> 5
>> 6
>> 7
>> 8
>> 9
>> 10
>> 11
>> 12
>> 13
>> 14
>> 16
>> 17
>> 18
>> 20
>> 22
>> 23
>> 24
>> 26
>> 27
>> 31
>> 33
>> 34
>> 
>> So how would I approach fixing this?
>> 
>>> On Jun 7, 2021, at 1:10 AM, 赵贺东 <zhaohedong@xxxxxxxxx> wrote:
>>> 
>>> Hello Jeremy Hansen,
>>> 
>>> try:
>>> ceph log last cephadm
>>> 
>>> or see files below
>>> /var/log/ceph/cephadm.log
>>> 
>>> 
>>> 
>>>> On Jun 7, 2021, at 15:49, Jeremy Hansen <jeremy@xxxxxxxxxx> wrote:
>>>> 
>>>> What’s the proper way to track down where this error is coming from?  Thanks.
>>>> 
>>>> 
>>>> 6/7/21 12:40:00 AM
>>>> [WRN]
>>>> [WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
>>>> 
>>>> 6/7/21 12:40:00 AM
>>>> [WRN]
>>>> Health detail: HEALTH_WARN 1 failed cephadm daemon(s)
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>> 
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> 
>> 
> 
> 

Attachment:
signature.asc

Description: Message signed with OpenPGP
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx