Re: OSD stuck down

Nicola Mori <mori@xxxxxxxxxx> · Thu, 15 Jun 2023 11:06:21 +0200

I have restarted all the monitors and managers, but still the osd 
remains down. But I found that cephadm actually sees -it running:

# ceph orch ps | grep osd.34
osd.34                     balin                 running (14m)   108s 
ago   8M    75.3M     793M  17.2.6   b1a23658afad  5b9dbea262c7

# ceph osd tree | grep 34
 34    hdd    1.81940          osd.34      down         0  1.00000

I really need help with this since I don't know what more to look at.
Thanks in advance,

Nicola

On 13/06/23 08:35, Nicola Mori wrote:
Dear Ceph users,

after a host reboot one of the OSDs is now stuck down (and out). I tried 
several times to restart it and even to reboot the host, but it still 
remains down.

# ceph -s
   cluster:
     id:     b1029256-7bb3-11ec-a8ce-ac1f6b627b45
     health: HEALTH_WARN
             4 OSD(s) have spurious read errors
             (muted: OSD_SLOW_PING_TIME_BACK OSD_SLOW_PING_TIME_FRONT)

   services:
     mon: 5 daemons, quorum bofur,balin,aka,romolo,dwalin (age 16h)
     mgr: bofur.tklnrn(active, since 16h), standbys: aka.wzystq, 
balin.hvunfe
     mds: 2/2 daemons up, 1 standby
     osd: 104 osds: 103 up (since 16h), 103 in (since 13h); 4 remapped pgs

   data:
     volumes: 1/1 healthy
     pools:   3 pools, 529 pgs
     objects: 18.85M objects, 41 TiB
     usage:   56 TiB used, 139 TiB / 195 TiB avail
     pgs:     68130/150150628 objects misplaced (0.045%)
              522 active+clean
              4   active+remapped+backfilling
              3   active+clean+scrubbing+deep

   io:
     recovery: 46 MiB/s, 21 objects/s

The host is reachable (its other OSDs are in) and from the systemd logs 
of the OSD I don't see anything wrong:

$ sudo systemctl status ceph-b1029256-7bb3-11ec-a8ce-ac1f6b627b45@osd.34
● ceph-b1029256-7bb3-11ec-a8ce-ac1f6b627b45@osd.34.service - Ceph osd.34 
for b1029256-7bb3-11ec-a8ce-ac1f6b627b45
    Loaded: loaded 
(/etc/systemd/system/ceph-b1029256-7bb3-11ec-a8ce-ac1f6b627b45@.service; 
enabled; vendor preset: disabled)
    Active: active (running) since Mon 2023-06-12 17:00:25 CEST; 15h ago
  Main PID: 36286 (bash)
     Tasks: 11 (limit: 152154)
    Memory: 20.0M
    CGroup: 
/system.slice/system-ceph\x2db1029256\x2d7bb3\x2d11ec\x2da8ce\x2dac1f6b627b45.slice/ceph-b1029256-7bb3-11ec-a8ce-ac1f6b627b45@osd.34.service
            ├─36286 /bin/bash 
/var/lib/ceph/b1029256-7bb3-11ec-a8ce-ac1f6b627b45/osd.34/unit.run
            └─36657 /usr/bin/docker run --rm --ipc=host 
--stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph-osd 
--privileged --group-add=disk --init --name 
ceph-b1029256-7bb3-11ec-a8ce-ac1f6b627b45-osd-34 --pids-limit=0 -e 
CONTAINER_IMAGE=snack14/ceph-wizard@sha>

Jun 12 17:00:25 balin systemd[1]: Started Ceph osd.34 for 
b1029256-7bb3-11ec-a8ce-ac1f6b627b45.
Jun 12 17:00:27 balin bash[36306]: Running command: /usr/bin/chown -R 
ceph:ceph /var/lib/ceph/osd/ceph-34
Jun 12 17:00:27 balin bash[36306]: Running command: 
/usr/bin/ceph-bluestore-tool prime-osd-dir --path 
/var/lib/ceph/osd/ceph-34 --no-mon-config --dev 
/dev/mapper/ceph--9a4c3927--d3da--4b49--80fe--6cdc00c7897c-osd--block--36d2f793--e5c7--4247--a314--bcc40389d50d
Jun 12 17:00:27 balin bash[36306]: Running command: /usr/bin/chown -h 
ceph:ceph 
/dev/mapper/ceph--9a4c3927--d3da--4b49--80fe--6cdc00c7897c-osd--block--36d2f793--e5c7--4247--a314--bcc40389d50d
Jun 12 17:00:27 balin bash[36306]: Running command: /usr/bin/chown -R 
ceph:ceph /dev/dm-6
Jun 12 17:00:27 balin bash[36306]: Running command: /usr/bin/ln -s 
/dev/mapper/ceph--9a4c3927--d3da--4b49--80fe--6cdc00c7897c-osd--block--36d2f793--e5c7--4247--a314--bcc40389d50d /var/lib/ceph/osd/ceph-34/block
Jun 12 17:00:27 balin bash[36306]: Running command: /usr/bin/chown -R 
ceph:ceph /var/lib/ceph/osd/ceph-34
Jun 12 17:00:27 balin bash[36306]: --> ceph-volume raw activate 
successful for osd ID: 34
Jun 12 17:00:29 balin bash[36657]: debug 2023-06-12T15:00:29.066+0000 
7f818e356540 -1 Falling back to public interface

I'd need some help to understand how to fix this.
Thank you,

Nicola

--
Nicola Mori, Ph.D.
INFN sezione di Firenze
Via Bruno Rossi 1, 50019 Sesto F.no (Italy)
+390554572660
mori@xxxxxxxxxx
Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx