Re: Non existing host in maintenance

Eugen Block <eblock@xxxxxx> · Fri, 17 Jan 2025 13:05:07 +0000

Did you fail the mgr? Otherwise it can take up to 15 minutes to  
refresh, I believe. The history doesn't hurt anyone, it's up to you if  
you want to keep it.

Zitat von Dominique Ramaekers <dominique.ramaekers@xxxxxxxxxx>:

I've removed two key's:
#ceph config-key rm mgr/cephadm/host.hvs004.devices.0
#ceph config-key rm mgr/telemetry/host-id/hvs004

Now I only have 'history' key's.
#ceph config-key ls | grep hvs004 | head gives me:
    "config-history/1027/+osd/host:hvs004/osd_memory_target",
    "config-history/1027/-osd/host:hvs004/osd_memory_target",
    "config-history/1028/+osd/host:hvs004/osd_memory_target",
    "config-history/1028/-osd/host:hvs004/osd_memory_target",
    "config-history/1037/+mgr.hvs004.fspaok/container_image",
    "config-history/1043/-mgr.hvs004.fspaok/container_image",
    "config-history/1054/+client.crash.hvs004/container_image",
    "config-history/1059/-client.crash.hvs004/container_image",
    "config-history/1274/+mgr.hvs004.fspaok/container_image",
    "config-history/1279/-mgr.hvs004.fspaok/container_image",

#ceph config-key ls | grep -v history | grep hvs004 doesn't list any  
keys anymore

I assume I should remove these 'history' key's to?

Note => for now the removed host is still in the warning...
#ceph health detail
HEALTH_WARN 1 host is in maintenance mode
[WRN] HOST_IN_MAINTENANCE: 1 host is in maintenance mode
    hvs004 is in maintenance

-----Oorspronkelijk bericht-----
Van: Eugen Block <eblock@xxxxxx>
Verzonden: vrijdag 17 januari 2025 12:17
Aan: ceph-users@xxxxxxx
Onderwerp:  Re: Non existing host in maintenance

Hi,

there's no need to wipe OSDs from a failed host. Just reinstall the OS,
configure it to your needs, install cephadm and podman/docker, add the
cephadm pub key and then just reactivate the OSDs:

ceph cephadm osd activate <host>

I just did that yesterday.

To clear the warning, I would check the config-key ls output:

ceph config-key ls | grep hvs004

And then try removing the host and devices keys:

ceph config-key rm mgr/cephadm/host.hvs004 ceph config-key rm
mgr/cephadm/host.hvs004.devices.0

Your output might look differently. I'm not entirely sure if this will be
sufficient, but let's try this first.

Zitat von Dominique Ramaekers <dominique.ramaekers@xxxxxxxxxx>:

> Hi,
>
> I have removed a host (hvs004) that was in maintenance.
>
> The system disk of this host had failed, so removed the host hvs004 in
> ceph; replaced the system disk; erased all the osd-disks and
> reinstalled the host as hvs005.
>
> Resulting a cluster status in waring that doesn’t goes away:
> health: HEALTH_WARN
>             1 host is in maintenance mode
>
> Removal is done by  “ceph orch host rm hvs004 --offline –force” in
> cephadm shell.
>
> How can I correct this false warning?
>
> Some more info:
>
> root@hvs001:/# ceph orch host ls
> HOST    ADDR       LABELS  STATUS
> hvs001  xxx.xxx.xxx.xxx  _admin
> hvs002  xxx.xxx.xxx.xxx  _admin
> hvs003  xxx.xxx.xxx.xxx  _admin
> hvs005  xxx.xxx.xxx.xxx  _admin
> 4 hosts in cluster
>
> root@hvs001:/# ceph health detail
> HEALTH_WARN 1 host is in maintenance mode [WRN]
HOST_IN_MAINTENANCE: 1
> host is in maintenance mode
>     hvs004 is in maintenance        :-/
>
> Help is greatly apreciated…
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
> email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email
to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx