I admit I don't follow what the exact problem is, but I wanted to point
out that as long as there are ANY OSD metadata files on a machine, some
(but not always all) ceph commands will consider there to be an OSD there.
To completely eradicate an OSD, I believe that (per Eugen Block) you
also have to set the OSD's crush weights to 0.
I had a heck of a time flushing out crud a few months back. I learned a
lot. Whether I wanted to or not.
Tim
On 11/7/24 16:57, Frédéric Nass wrote:
Hi Tim,
I see what you're referring to, but it doesn't apply since there's actually **no** stray daemons, that is **no** ghost process on any hosts trying to start.
Here we're talking about unexpected behavior, most likely a bug.
Regards,
Frédéric.
----- Le 7 Nov 24, à 21:08, Tim Holloway timh@xxxxxxxxxxxxx a écrit :
You can get this sort of behaviour because different Ceph subsystems get
their information from different places instead of having an
authoritative source of information.
Specifically, Ceph may look directly at:
A) Its configuration database
B) Systemd units running on the OSD host
C) Containers running ceph modules.
The problem is especially likely if you've managed to end up running the
same OSD number as both legacy (/var/lib/ceph/osd.x) and Manager
(/var/lib/ceph/{fsid}/ceph).
If you have a dual-defined OSD, the cleanest approach seems to be to
stop ceph on the bad machine and manually delete the /var/lib/ceph/osd.x
directory. You may need to delete a systemd unit file for that OSD.
You cannot delete the systemd unit for Managed OSDs. It's dynamically
created when the system comes up and will simply re-create itself. Which
is why it's easier to purge the artefacts of a legacy OSD.
Tim
On 11/7/24 10:28, Frédéric Nass wrote:
Hi,
We're encountering this unexpected behavior as well. This tracker [1] was
created 4 months ago.
Regards,
Frédéric.
[1] https://tracker.ceph.com/issues/67018
----- Le 6 Déc 22, à 8:41, Holger Naundorf naundorf@xxxxxxxxxxxxxx a écrit :
Hello,
a mgr failover did not change the situation - the osd still shows up in
the 'ceph node ls' - I assume that this is more or less 'working as
intended' as I did ask for the OSD to be kept in the CRUSH map to be
replacd later - but as we are still not so experienced with Ceph here I
wanted to get some input from other sites.
Regards,
Holger
On 30.11.22 16:28, Adam King wrote:
I typically don't see this when I do OSD replacement. If you do a mgr
failover ("ceph mgr fail") and wait a few minutes does this still show
up? The stray daemon/host warning is roughly equivalent to comparing the
daemons in `ceph node ls` and `ceph orch ps` and seeing if there's
anything in the former but not the latter. Sometimes I have seen the mgr
will have some out of data info in the node ls and a failover will
refresh it.
On Fri, Nov 25, 2022 at 6:07 AM Holger Naundorf <naundorf@xxxxxxxxxxxxxx
<mailto:naundorf@xxxxxxxxxxxxxx>> wrote:
Hello,
I have a question about osd removal/replacement:
I just removed an osd where the disk was still running but had read
errors, leading to failed deep scrubs - as the intent is to replace
this
as soon as we manage to get a spare I removed it with the
'--replace' flag:
# ceph orch osd rm 224 --replace
After all placement groups are evacuated I now have 1 osd down/out
and showing as 'destroyed':
# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
(...)
214 hdd 14.55269 osd.214 up 1.00000 1.00000
224 hdd 14.55269 osd.224 destroyed 0 1.00000
234 hdd 14.55269 osd.234 up 1.00000 1.00000
(...)
All as expected - but now the health check complains that the
(destroyed) osd is not managed:
# ceph health detail
HEALTH_WARN 1 stray daemon(s) not managed by cephadm
[WRN] CEPHADM_STRAY_DAEMON: 1 stray daemon(s) not managed by cephadm
stray daemon osd.224 on host ceph19 not managed by cephadm
Is this expected behaviour and I have to live with the yellow check
until we get a replacement disk and recreate the osd or did something
not finish correctly?
Regards,
Holger
--
Dr. Holger Naundorf
Christian-Albrechts-Universität zu Kiel
Rechenzentrum / HPC / Server und Storage
Tel: +49 431 880-1990
Fax: +49 431 880-1523
naundorf@xxxxxxxxxxxxxx <mailto:naundorf@xxxxxxxxxxxxxx>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
<mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx
<mailto:ceph-users-leave@xxxxxxx>
--
Dr. Holger Naundorf
Christian-Albrechts-Universität zu Kiel
Rechenzentrum / HPC / Server und Storage
Tel: +49 431 880-1990
Fax: +49 431 880-1523
naundorf@xxxxxxxxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx