Schödinger's OSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Incidentally, I just noticed that my phantom host isn't completely
gone. It's not in the host list, either command-line or dashboard, but
it does list (with no assets) as a host under "ceph osd tree".

---

More seriously, I've been having problems with OSDs that report as
being both up and down at the same time.

This is on 2 new hosts. One host saw this when I made it the _admin
host. The other caught it because it's running in a VM with the OSD
mapped out as an imported disk and the host os managed to flip which
drive was sda and which was sbd, resulting in having to delete and re-
define the OSD in the VM.

But now the OSD on this VM reports as "UP/IN" on the dashboard while
it's "error" on "ceph orch ps" and on the actual vbox, the osd
container fails on startup. viz:

ul 12 20:06:48 ceph05.internal.mousetech.com ceph-278fcd86-0861-11ee-
a7df-9c5c8e86cf8f-osd-4[4017]: debug 2024-07-12T20:06:48.056+0000
7fc17dfb9380 -1 bdev(0x55e4853c4800 /var/lib/ceph/osd/ceph-4/block)
open open got: (16) De>
Jul 12 20:06:48 ceph05.internal.mousetech.com ceph-278fcd86-0861-11ee-
a7df-9c5c8e86cf8f-osd-4[4017]: debug 2024-07-12T20:06:48.056+0000
7fc17dfb9380 -1 osd.4 0 OSD:init: unable to mount object store
Jul 12 20:06:48 ceph05.internal.mousetech.com ceph-278fcd86-0861-11ee-
a7df-9c5c8e86cf8f-osd-4[4017]: debug 2024-07-12T20:06:48.056+0000
7fc17dfb9380 -1  ** ERROR: osd init failed: (16) Device or resource
busy

Note that truncated message above reads

bdev(0x55e4853c4800 /var/lib/ceph/osd/ceph-4/block) open open got: (16)
Device or resource busy

Rebooting doesn't help. Nor does freeing us resources and
stopping/starting processes manually. The problem eventually cleared up
spontaneously on the admin box, but I have no idea why.

---

Also noted that now the OSD on the admin box shows in ceph orch ps as
"stopped", though again, the dashboard lists it as "UP/IN".	

Here's what systemctl thinks about it:

systemctl status ceph-osd@5.service
● ceph-osd@5.service - Ceph object storage daemon osd.5
     Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service;
enabled-runtime; preset: disabled)
     Active: active (running) since Fri 2024-07-12 16:45:51 EDT; 1min
40s ago
    Process: 8511 ExecStartPre=/usr/libexec/ceph/ceph-osd-prestart.sh -
-cluster ${CLUSTER} --id 5 (code=exited, status=0/SUCCESS)
   Main PID: 8517 (ceph-osd)
      Tasks: 70
     Memory: 478.6M
        CPU: 3.405s
     CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@5.service
             └─8517 /usr/bin/ceph-osd -f --cluster ceph --id 5 --
setuser ceph --setgroup ceph

Jul 12 16:45:51 dell02.mousetech.com systemd[1]: Starting Ceph object
storage daemon osd.5...
Jul 12 16:45:51 dell02.mousetech.com systemd[1]: Started Ceph object
storage daemon osd.5.
Jul 12 16:45:51 dell02.mousetech.com ceph-osd[8517]: 2024-07-
12T16:45:51.642-0400 7f2bd440c140 -1 Falling back to public interface
Jul 12 16:45:58 dell02.mousetech.com ceph-osd[8517]: 2024-07-
12T16:45:58.352-0400 7f2bd440c140 -1 osd.5 34161 log_to_monitors
{default=true}
Jul 12 16:45:59 dell02.mousetech.com ceph-osd[8517]: 2024-07-
12T16:45:59.206-0400 7f2bcbbf0640 -1 osd.5 34161 set_numa_affinity
unable to identify public interface '' numa node: (2) No such file or
directory

The actual container is not running.

Ceph version, incidentally, is 16.2.15. Except for that one node that
apparently didn't move up from Octupus (I'll be nuking that one
shortly).
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux