Dear cephers,we are sometimes observing stalling IO on our ceph 17.2.6 cluster when the backing device for the primary OSD of a PG fails and seems to block read IO to objects from that pg. If I set the OSD with the broken device to down, the IO continues. Setting the OSD to down is not sufficient.
The cluster is running on Debian 11, the pool is an erasure coded cephfs data pool. The OSD has a HDD data device and an SSD db device. The data devices is the one which failed and was blocking IO.
The OSD was reporting slow ops and short time after that smartd notified about unreadable sectors.
Has anyone seen such behaviour? Are there some tweaks that I missed? Kind regards, Daniel -- Daniel Schreiber Facharbeitsgruppe Systemsoftware Universitaetsrechenzentrum Technische Universität Chemnitz Straße der Nationen 62 (Raum B303) 09111 Chemnitz Germany Tel: +49 371 531 35444 Fax: +49 371 531 835444
Attachment:
smime.p7s
Description: Kryptografische S/MIME-Signatur
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx