Hi,
did you check the MON logs? They should contain some information about
the reason why the OSD is marked down and out. You could also just try
to mark it in yourself, does it change anything?
$ ceph osd in 34
I would also take another look into the OSD logs:
cephadm logs --name osd.34
Zitat von Nicola Mori <mori@xxxxxxxxxx>:
Dear Ceph users,
after a host reboot one of the OSDs is now stuck down (and out). I
tried several times to restart it and even to reboot the host, but
it still remains down.
# ceph -s
cluster:
id: b1029256-7bb3-11ec-a8ce-ac1f6b627b45
health: HEALTH_WARN
4 OSD(s) have spurious read errors
(muted: OSD_SLOW_PING_TIME_BACK OSD_SLOW_PING_TIME_FRONT)
services:
mon: 5 daemons, quorum bofur,balin,aka,romolo,dwalin (age 16h)
mgr: bofur.tklnrn(active, since 16h), standbys: aka.wzystq, balin.hvunfe
mds: 2/2 daemons up, 1 standby
osd: 104 osds: 103 up (since 16h), 103 in (since 13h); 4 remapped pgs
data:
volumes: 1/1 healthy
pools: 3 pools, 529 pgs
objects: 18.85M objects, 41 TiB
usage: 56 TiB used, 139 TiB / 195 TiB avail
pgs: 68130/150150628 objects misplaced (0.045%)
522 active+clean
4 active+remapped+backfilling
3 active+clean+scrubbing+deep
io:
recovery: 46 MiB/s, 21 objects/s
The host is reachable (its other OSDs are in) and from the systemd
logs of the OSD I don't see anything wrong:
$ sudo systemctl status ceph-b1029256-7bb3-11ec-a8ce-ac1f6b627b45@osd.34
● ceph-b1029256-7bb3-11ec-a8ce-ac1f6b627b45@osd.34.service - Ceph
osd.34 for b1029256-7bb3-11ec-a8ce-ac1f6b627b45
Loaded: loaded
(/etc/systemd/system/ceph-b1029256-7bb3-11ec-a8ce-ac1f6b627b45@.service;
enabled; vendor preset: disabled)
Active: active (running) since Mon 2023-06-12 17:00:25 CEST; 15h ago
Main PID: 36286 (bash)
Tasks: 11 (limit: 152154)
Memory: 20.0M
CGroup:
/system.slice/system-ceph\x2db1029256\x2d7bb3\x2d11ec\x2da8ce\x2dac1f6b627b45.slice/ceph-b1029256-7bb3-11ec-a8ce-ac1f6b627b45@osd.34.service
├─36286 /bin/bash
/var/lib/ceph/b1029256-7bb3-11ec-a8ce-ac1f6b627b45/osd.34/unit.run
└─36657 /usr/bin/docker run --rm --ipc=host
--stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph-osd
--privileged --group-add=disk --init --name
ceph-b1029256-7bb3-11ec-a8ce-ac1f6b627b45-osd-34 --pids-limit=0 -e
CONTAINER_IMAGE=snack14/ceph-wizard@sha>
Jun 12 17:00:25 balin systemd[1]: Started Ceph osd.34 for
b1029256-7bb3-11ec-a8ce-ac1f6b627b45.
Jun 12 17:00:27 balin bash[36306]: Running command: /usr/bin/chown
-R ceph:ceph /var/lib/ceph/osd/ceph-34
Jun 12 17:00:27 balin bash[36306]: Running command:
/usr/bin/ceph-bluestore-tool prime-osd-dir --path
/var/lib/ceph/osd/ceph-34 --no-mon-config --dev
/dev/mapper/ceph--9a4c3927--d3da--4b49--80fe--6cdc00c7897c-osd--block--36d2f793--e5c7--4247--a314--bcc40389d50d
Jun 12 17:00:27 balin bash[36306]: Running command: /usr/bin/chown
-h ceph:ceph
/dev/mapper/ceph--9a4c3927--d3da--4b49--80fe--6cdc00c7897c-osd--block--36d2f793--e5c7--4247--a314--bcc40389d50d
Jun 12 17:00:27 balin bash[36306]: Running command: /usr/bin/chown
-R ceph:ceph /dev/dm-6
Jun 12 17:00:27 balin bash[36306]: Running command: /usr/bin/ln -s
/dev/mapper/ceph--9a4c3927--d3da--4b49--80fe--6cdc00c7897c-osd--block--36d2f793--e5c7--4247--a314--bcc40389d50d
/var/lib/ceph/osd/ceph-34/block
Jun 12 17:00:27 balin bash[36306]: Running command: /usr/bin/chown
-R ceph:ceph /var/lib/ceph/osd/ceph-34
Jun 12 17:00:27 balin bash[36306]: --> ceph-volume raw activate
successful for osd ID: 34
Jun 12 17:00:29 balin bash[36657]: debug
2023-06-12T15:00:29.066+0000 7f818e356540 -1 Falling back to public
interface
I'd need some help to understand how to fix this.
Thank you,
Nicola
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx