Hi,
containerized daemons usually have the fsid in the systemd unit, like
ceph-{fsid}@osd.5
Is it possible that you have those confused? Check the
/var/lib/ceph/osd/ directory to find possible orphaned daemons and
clean them up.
And as previously stated, it would help to see your osd tree and which
OSDs you’re talking about.
Zitat von Tim Holloway <timh@xxxxxxxxxxxxx>:
Incidentally, I just noticed that my phantom host isn't completely
gone. It's not in the host list, either command-line or dashboard, but
it does list (with no assets) as a host under "ceph osd tree".
---
More seriously, I've been having problems with OSDs that report as
being both up and down at the same time.
This is on 2 new hosts. One host saw this when I made it the _admin
host. The other caught it because it's running in a VM with the OSD
mapped out as an imported disk and the host os managed to flip which
drive was sda and which was sbd, resulting in having to delete and re-
define the OSD in the VM.
But now the OSD on this VM reports as "UP/IN" on the dashboard while
it's "error" on "ceph orch ps" and on the actual vbox, the osd
container fails on startup. viz:
ul 12 20:06:48 ceph05.internal.mousetech.com ceph-278fcd86-0861-11ee-
a7df-9c5c8e86cf8f-osd-4[4017]: debug 2024-07-12T20:06:48.056+0000
7fc17dfb9380 -1 bdev(0x55e4853c4800 /var/lib/ceph/osd/ceph-4/block)
open open got: (16) De>
Jul 12 20:06:48 ceph05.internal.mousetech.com ceph-278fcd86-0861-11ee-
a7df-9c5c8e86cf8f-osd-4[4017]: debug 2024-07-12T20:06:48.056+0000
7fc17dfb9380 -1 osd.4 0 OSD:init: unable to mount object store
Jul 12 20:06:48 ceph05.internal.mousetech.com ceph-278fcd86-0861-11ee-
a7df-9c5c8e86cf8f-osd-4[4017]: debug 2024-07-12T20:06:48.056+0000
7fc17dfb9380 -1 ** ERROR: osd init failed: (16) Device or resource
busy
Note that truncated message above reads
bdev(0x55e4853c4800 /var/lib/ceph/osd/ceph-4/block) open open got: (16)
Device or resource busy
Rebooting doesn't help. Nor does freeing us resources and
stopping/starting processes manually. The problem eventually cleared up
spontaneously on the admin box, but I have no idea why.
---
Also noted that now the OSD on the admin box shows in ceph orch ps as
"stopped", though again, the dashboard lists it as "UP/IN".
Here's what systemctl thinks about it:
systemctl status ceph-osd@5.service
● ceph-osd@5.service - Ceph object storage daemon osd.5
Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service;
enabled-runtime; preset: disabled)
Active: active (running) since Fri 2024-07-12 16:45:51 EDT; 1min
40s ago
Process: 8511 ExecStartPre=/usr/libexec/ceph/ceph-osd-prestart.sh -
-cluster ${CLUSTER} --id 5 (code=exited, status=0/SUCCESS)
Main PID: 8517 (ceph-osd)
Tasks: 70
Memory: 478.6M
CPU: 3.405s
CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@5.service
└─8517 /usr/bin/ceph-osd -f --cluster ceph --id 5 --
setuser ceph --setgroup ceph
Jul 12 16:45:51 dell02.mousetech.com systemd[1]: Starting Ceph object
storage daemon osd.5...
Jul 12 16:45:51 dell02.mousetech.com systemd[1]: Started Ceph object
storage daemon osd.5.
Jul 12 16:45:51 dell02.mousetech.com ceph-osd[8517]: 2024-07-
12T16:45:51.642-0400 7f2bd440c140 -1 Falling back to public interface
Jul 12 16:45:58 dell02.mousetech.com ceph-osd[8517]: 2024-07-
12T16:45:58.352-0400 7f2bd440c140 -1 osd.5 34161 log_to_monitors
{default=true}
Jul 12 16:45:59 dell02.mousetech.com ceph-osd[8517]: 2024-07-
12T16:45:59.206-0400 7f2bcbbf0640 -1 osd.5 34161 set_numa_affinity
unable to identify public interface '' numa node: (2) No such file or
directory
The actual container is not running.
Ceph version, incidentally, is 16.2.15. Except for that one node that
apparently didn't move up from Octupus (I'll be nuking that one
shortly).
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx