Hi, I have seen this behaviour when the OSD host cluster interface was down but the public interface was up. I suggest checking the network interfaces and the connectivity. Regards! On Thu, Jun 15, 2023 at 11:08 AM Nicola Mori <mori@xxxxxxxxxx> wrote: > I have restarted all the monitors and managers, but still the osd > remains down. But I found that cephadm actually sees -it running: > > # ceph orch ps | grep osd.34 > osd.34 balin running (14m) 108s > ago 8M 75.3M 793M 17.2.6 b1a23658afad 5b9dbea262c7 > > # ceph osd tree | grep 34 > 34 hdd 1.81940 osd.34 down 0 1.00000 > > > I really need help with this since I don't know what more to look at. > Thanks in advance, > > Nicola > > > On 13/06/23 08:35, Nicola Mori wrote: > > Dear Ceph users, > > > > after a host reboot one of the OSDs is now stuck down (and out). I tried > > several times to restart it and even to reboot the host, but it still > > remains down. > > > > # ceph -s > > cluster: > > id: b1029256-7bb3-11ec-a8ce-ac1f6b627b45 > > health: HEALTH_WARN > > 4 OSD(s) have spurious read errors > > (muted: OSD_SLOW_PING_TIME_BACK OSD_SLOW_PING_TIME_FRONT) > > > > services: > > mon: 5 daemons, quorum bofur,balin,aka,romolo,dwalin (age 16h) > > mgr: bofur.tklnrn(active, since 16h), standbys: aka.wzystq, > > balin.hvunfe > > mds: 2/2 daemons up, 1 standby > > osd: 104 osds: 103 up (since 16h), 103 in (since 13h); 4 remapped > pgs > > > > data: > > volumes: 1/1 healthy > > pools: 3 pools, 529 pgs > > objects: 18.85M objects, 41 TiB > > usage: 56 TiB used, 139 TiB / 195 TiB avail > > pgs: 68130/150150628 objects misplaced (0.045%) > > 522 active+clean > > 4 active+remapped+backfilling > > 3 active+clean+scrubbing+deep > > > > io: > > recovery: 46 MiB/s, 21 objects/s > > > > > > > > The host is reachable (its other OSDs are in) and from the systemd logs > > of the OSD I don't see anything wrong: > > > > $ sudo systemctl status ceph-b1029256-7bb3-11ec-a8ce-ac1f6b627b45@osd.34 > > ● ceph-b1029256-7bb3-11ec-a8ce-ac1f6b627b45@osd.34.service - Ceph > osd.34 > > for b1029256-7bb3-11ec-a8ce-ac1f6b627b45 > > Loaded: loaded > > (/etc/systemd/system/ceph-b1029256-7bb3-11ec-a8ce-ac1f6b627b45@.service; > > > enabled; vendor preset: disabled) > > Active: active (running) since Mon 2023-06-12 17:00:25 CEST; 15h ago > > Main PID: 36286 (bash) > > Tasks: 11 (limit: 152154) > > Memory: 20.0M > > CGroup: > > > /system.slice/system-ceph\x2db1029256\x2d7bb3\x2d11ec\x2da8ce\x2dac1f6b627b45.slice/ceph-b1029256-7bb3-11ec-a8ce-ac1f6b627b45@osd.34.service > > ├─36286 /bin/bash > > /var/lib/ceph/b1029256-7bb3-11ec-a8ce-ac1f6b627b45/osd.34/unit.run > > └─36657 /usr/bin/docker run --rm --ipc=host > > --stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph-osd > > --privileged --group-add=disk --init --name > > ceph-b1029256-7bb3-11ec-a8ce-ac1f6b627b45-osd-34 --pids-limit=0 -e > > CONTAINER_IMAGE=snack14/ceph-wizard@sha> > > > > Jun 12 17:00:25 balin systemd[1]: Started Ceph osd.34 for > > b1029256-7bb3-11ec-a8ce-ac1f6b627b45. > > Jun 12 17:00:27 balin bash[36306]: Running command: /usr/bin/chown -R > > ceph:ceph /var/lib/ceph/osd/ceph-34 > > Jun 12 17:00:27 balin bash[36306]: Running command: > > /usr/bin/ceph-bluestore-tool prime-osd-dir --path > > /var/lib/ceph/osd/ceph-34 --no-mon-config --dev > > > /dev/mapper/ceph--9a4c3927--d3da--4b49--80fe--6cdc00c7897c-osd--block--36d2f793--e5c7--4247--a314--bcc40389d50d > > Jun 12 17:00:27 balin bash[36306]: Running command: /usr/bin/chown -h > > ceph:ceph > > > /dev/mapper/ceph--9a4c3927--d3da--4b49--80fe--6cdc00c7897c-osd--block--36d2f793--e5c7--4247--a314--bcc40389d50d > > Jun 12 17:00:27 balin bash[36306]: Running command: /usr/bin/chown -R > > ceph:ceph /dev/dm-6 > > Jun 12 17:00:27 balin bash[36306]: Running command: /usr/bin/ln -s > > > /dev/mapper/ceph--9a4c3927--d3da--4b49--80fe--6cdc00c7897c-osd--block--36d2f793--e5c7--4247--a314--bcc40389d50d > /var/lib/ceph/osd/ceph-34/block > > Jun 12 17:00:27 balin bash[36306]: Running command: /usr/bin/chown -R > > ceph:ceph /var/lib/ceph/osd/ceph-34 > > Jun 12 17:00:27 balin bash[36306]: --> ceph-volume raw activate > > successful for osd ID: 34 > > Jun 12 17:00:29 balin bash[36657]: debug 2023-06-12T15:00:29.066+0000 > > 7f818e356540 -1 Falling back to public interface > > > > > > I'd need some help to understand how to fix this. > > Thank you, > > > > Nicola > > -- > Nicola Mori, Ph.D. > INFN sezione di Firenze > Via Bruno Rossi 1, 50019 Sesto F.no (Italy) > +390554572660 > mori@xxxxxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx