Re: Non existing host in maintenance

Dominique Ramaekers <dominique.ramaekers@xxxxxxxxxx> · Fri, 17 Jan 2025 12:46:09 +0000

I've removed two key's:
#ceph config-key rm mgr/cephadm/host.hvs004.devices.0
#ceph config-key rm mgr/telemetry/host-id/hvs004

Now I only have 'history' key's.
#ceph config-key ls | grep hvs004 | head gives me:
    "config-history/1027/+osd/host:hvs004/osd_memory_target",
    "config-history/1027/-osd/host:hvs004/osd_memory_target",
    "config-history/1028/+osd/host:hvs004/osd_memory_target",
    "config-history/1028/-osd/host:hvs004/osd_memory_target",
    "config-history/1037/+mgr.hvs004.fspaok/container_image",
    "config-history/1043/-mgr.hvs004.fspaok/container_image",
    "config-history/1054/+client.crash.hvs004/container_image",
    "config-history/1059/-client.crash.hvs004/container_image",
    "config-history/1274/+mgr.hvs004.fspaok/container_image",
    "config-history/1279/-mgr.hvs004.fspaok/container_image",

#ceph config-key ls | grep -v history | grep hvs004 doesn't list any keys anymore

I assume I should remove these 'history' key's to?

Note => for now the removed host is still in the warning...
#ceph health detail
HEALTH_WARN 1 host is in maintenance mode
[WRN] HOST_IN_MAINTENANCE: 1 host is in maintenance mode
    hvs004 is in maintenance

> -----Oorspronkelijk bericht-----
> Van: Eugen Block <eblock@xxxxxx>
> Verzonden: vrijdag 17 januari 2025 12:17
> Aan: ceph-users@xxxxxxx
> Onderwerp:  Re: Non existing host in maintenance
>
> Hi,
>
> there's no need to wipe OSDs from a failed host. Just reinstall the OS,
> configure it to your needs, install cephadm and podman/docker, add the
> cephadm pub key and then just reactivate the OSDs:
>
> ceph cephadm osd activate <host>
>
> I just did that yesterday.
>
> To clear the warning, I would check the config-key ls output:
>
> ceph config-key ls | grep hvs004
>
> And then try removing the host and devices keys:
>
> ceph config-key rm mgr/cephadm/host.hvs004 ceph config-key rm
> mgr/cephadm/host.hvs004.devices.0
>
> Your output might look differently. I'm not entirely sure if this will be
> sufficient, but let's try this first.
>
> Zitat von Dominique Ramaekers <dominique.ramaekers@xxxxxxxxxx>:
>
> > Hi,
> >
> > I have removed a host (hvs004) that was in maintenance.
> >
> > The system disk of this host had failed, so removed the host hvs004 in
> > ceph; replaced the system disk; erased all the osd-disks and
> > reinstalled the host as hvs005.
> >
> > Resulting a cluster status in waring that doesn’t goes away:
> > health: HEALTH_WARN
> >             1 host is in maintenance mode
> >
> > Removal is done by  “ceph orch host rm hvs004 --offline –force” in
> > cephadm shell.
> >
> > How can I correct this false warning?
> >
> > Some more info:
> >
> > root@hvs001:/# ceph orch host ls
> > HOST    ADDR       LABELS  STATUS
> > hvs001  xxx.xxx.xxx.xxx  _admin
> > hvs002  xxx.xxx.xxx.xxx  _admin
> > hvs003  xxx.xxx.xxx.xxx  _admin
> > hvs005  xxx.xxx.xxx.xxx  _admin
> > 4 hosts in cluster
> >
> > root@hvs001:/# ceph health detail
> > HEALTH_WARN 1 host is in maintenance mode [WRN]
> HOST_IN_MAINTENANCE: 1
> > host is in maintenance mode
> >     hvs004 is in maintenance        :-/
> >
> > Help is greatly apreciated…
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
> > email to ceph-users-leave@xxxxxxx
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email
> to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx