Re: OSD stuck down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have restarted all the monitors and managers, but still the osd remains down. But I found that cephadm actually sees -it running:

# ceph orch ps | grep osd.34
osd.34 balin running (14m) 108s ago 8M 75.3M 793M 17.2.6 b1a23658afad 5b9dbea262c7

# ceph osd tree | grep 34
 34    hdd    1.81940          osd.34      down         0  1.00000


I really need help with this since I don't know what more to look at.
Thanks in advance,

Nicola


On 13/06/23 08:35, Nicola Mori wrote:
Dear Ceph users,

after a host reboot one of the OSDs is now stuck down (and out). I tried several times to restart it and even to reboot the host, but it still remains down.

# ceph -s
   cluster:
     id:     b1029256-7bb3-11ec-a8ce-ac1f6b627b45
     health: HEALTH_WARN
             4 OSD(s) have spurious read errors
             (muted: OSD_SLOW_PING_TIME_BACK OSD_SLOW_PING_TIME_FRONT)

   services:
     mon: 5 daemons, quorum bofur,balin,aka,romolo,dwalin (age 16h)
    mgr: bofur.tklnrn(active, since 16h), standbys: aka.wzystq, balin.hvunfe
     mds: 2/2 daemons up, 1 standby
     osd: 104 osds: 103 up (since 16h), 103 in (since 13h); 4 remapped pgs

   data:
     volumes: 1/1 healthy
     pools:   3 pools, 529 pgs
     objects: 18.85M objects, 41 TiB
     usage:   56 TiB used, 139 TiB / 195 TiB avail
     pgs:     68130/150150628 objects misplaced (0.045%)
              522 active+clean
              4   active+remapped+backfilling
              3   active+clean+scrubbing+deep

   io:
     recovery: 46 MiB/s, 21 objects/s



The host is reachable (its other OSDs are in) and from the systemd logs of the OSD I don't see anything wrong:

$ sudo systemctl status ceph-b1029256-7bb3-11ec-a8ce-ac1f6b627b45@osd.34
● ceph-b1029256-7bb3-11ec-a8ce-ac1f6b627b45@osd.34.service - Ceph osd.34 for b1029256-7bb3-11ec-a8ce-ac1f6b627b45    Loaded: loaded (/etc/systemd/system/ceph-b1029256-7bb3-11ec-a8ce-ac1f6b627b45@.service; enabled; vendor preset: disabled)
    Active: active (running) since Mon 2023-06-12 17:00:25 CEST; 15h ago
  Main PID: 36286 (bash)
     Tasks: 11 (limit: 152154)
    Memory: 20.0M
   CGroup: /system.slice/system-ceph\x2db1029256\x2d7bb3\x2d11ec\x2da8ce\x2dac1f6b627b45.slice/ceph-b1029256-7bb3-11ec-a8ce-ac1f6b627b45@osd.34.service            ├─36286 /bin/bash /var/lib/ceph/b1029256-7bb3-11ec-a8ce-ac1f6b627b45/osd.34/unit.run            └─36657 /usr/bin/docker run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph-osd --privileged --group-add=disk --init --name ceph-b1029256-7bb3-11ec-a8ce-ac1f6b627b45-osd-34 --pids-limit=0 -e CONTAINER_IMAGE=snack14/ceph-wizard@sha>

Jun 12 17:00:25 balin systemd[1]: Started Ceph osd.34 for b1029256-7bb3-11ec-a8ce-ac1f6b627b45. Jun 12 17:00:27 balin bash[36306]: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-34 Jun 12 17:00:27 balin bash[36306]: Running command: /usr/bin/ceph-bluestore-tool prime-osd-dir --path /var/lib/ceph/osd/ceph-34 --no-mon-config --dev /dev/mapper/ceph--9a4c3927--d3da--4b49--80fe--6cdc00c7897c-osd--block--36d2f793--e5c7--4247--a314--bcc40389d50d Jun 12 17:00:27 balin bash[36306]: Running command: /usr/bin/chown -h ceph:ceph /dev/mapper/ceph--9a4c3927--d3da--4b49--80fe--6cdc00c7897c-osd--block--36d2f793--e5c7--4247--a314--bcc40389d50d Jun 12 17:00:27 balin bash[36306]: Running command: /usr/bin/chown -R ceph:ceph /dev/dm-6 Jun 12 17:00:27 balin bash[36306]: Running command: /usr/bin/ln -s /dev/mapper/ceph--9a4c3927--d3da--4b49--80fe--6cdc00c7897c-osd--block--36d2f793--e5c7--4247--a314--bcc40389d50d /var/lib/ceph/osd/ceph-34/block Jun 12 17:00:27 balin bash[36306]: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-34 Jun 12 17:00:27 balin bash[36306]: --> ceph-volume raw activate successful for osd ID: 34 Jun 12 17:00:29 balin bash[36657]: debug 2023-06-12T15:00:29.066+0000 7f818e356540 -1 Falling back to public interface


I'd need some help to understand how to fix this.
Thank you,

Nicola

--
Nicola Mori, Ph.D.
INFN sezione di Firenze
Via Bruno Rossi 1, 50019 Sesto F.no (Italy)
+390554572660
mori@xxxxxxxxxx

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux