OSD stuck down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Ceph users,

after a host reboot one of the OSDs is now stuck down (and out). I tried several times to restart it and even to reboot the host, but it still remains down.

# ceph -s
  cluster:
    id:     b1029256-7bb3-11ec-a8ce-ac1f6b627b45
    health: HEALTH_WARN
            4 OSD(s) have spurious read errors
            (muted: OSD_SLOW_PING_TIME_BACK OSD_SLOW_PING_TIME_FRONT)

  services:
    mon: 5 daemons, quorum bofur,balin,aka,romolo,dwalin (age 16h)
mgr: bofur.tklnrn(active, since 16h), standbys: aka.wzystq, balin.hvunfe
    mds: 2/2 daemons up, 1 standby
    osd: 104 osds: 103 up (since 16h), 103 in (since 13h); 4 remapped pgs

  data:
    volumes: 1/1 healthy
    pools:   3 pools, 529 pgs
    objects: 18.85M objects, 41 TiB
    usage:   56 TiB used, 139 TiB / 195 TiB avail
    pgs:     68130/150150628 objects misplaced (0.045%)
             522 active+clean
             4   active+remapped+backfilling
             3   active+clean+scrubbing+deep

  io:
    recovery: 46 MiB/s, 21 objects/s



The host is reachable (its other OSDs are in) and from the systemd logs of the OSD I don't see anything wrong:

$ sudo systemctl status ceph-b1029256-7bb3-11ec-a8ce-ac1f6b627b45@osd.34
● ceph-b1029256-7bb3-11ec-a8ce-ac1f6b627b45@osd.34.service - Ceph osd.34 for b1029256-7bb3-11ec-a8ce-ac1f6b627b45 Loaded: loaded (/etc/systemd/system/ceph-b1029256-7bb3-11ec-a8ce-ac1f6b627b45@.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2023-06-12 17:00:25 CEST; 15h ago
 Main PID: 36286 (bash)
    Tasks: 11 (limit: 152154)
   Memory: 20.0M
CGroup: /system.slice/system-ceph\x2db1029256\x2d7bb3\x2d11ec\x2da8ce\x2dac1f6b627b45.slice/ceph-b1029256-7bb3-11ec-a8ce-ac1f6b627b45@osd.34.service ├─36286 /bin/bash /var/lib/ceph/b1029256-7bb3-11ec-a8ce-ac1f6b627b45/osd.34/unit.run └─36657 /usr/bin/docker run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph-osd --privileged --group-add=disk --init --name ceph-b1029256-7bb3-11ec-a8ce-ac1f6b627b45-osd-34 --pids-limit=0 -e CONTAINER_IMAGE=snack14/ceph-wizard@sha>

Jun 12 17:00:25 balin systemd[1]: Started Ceph osd.34 for b1029256-7bb3-11ec-a8ce-ac1f6b627b45. Jun 12 17:00:27 balin bash[36306]: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-34 Jun 12 17:00:27 balin bash[36306]: Running command: /usr/bin/ceph-bluestore-tool prime-osd-dir --path /var/lib/ceph/osd/ceph-34 --no-mon-config --dev /dev/mapper/ceph--9a4c3927--d3da--4b49--80fe--6cdc00c7897c-osd--block--36d2f793--e5c7--4247--a314--bcc40389d50d Jun 12 17:00:27 balin bash[36306]: Running command: /usr/bin/chown -h ceph:ceph /dev/mapper/ceph--9a4c3927--d3da--4b49--80fe--6cdc00c7897c-osd--block--36d2f793--e5c7--4247--a314--bcc40389d50d Jun 12 17:00:27 balin bash[36306]: Running command: /usr/bin/chown -R ceph:ceph /dev/dm-6 Jun 12 17:00:27 balin bash[36306]: Running command: /usr/bin/ln -s /dev/mapper/ceph--9a4c3927--d3da--4b49--80fe--6cdc00c7897c-osd--block--36d2f793--e5c7--4247--a314--bcc40389d50d /var/lib/ceph/osd/ceph-34/block Jun 12 17:00:27 balin bash[36306]: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-34 Jun 12 17:00:27 balin bash[36306]: --> ceph-volume raw activate successful for osd ID: 34 Jun 12 17:00:29 balin bash[36657]: debug 2023-06-12T15:00:29.066+0000 7f818e356540 -1 Falling back to public interface


I'd need some help to understand how to fix this.
Thank you,

Nicola

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux