Hi,
We've been running into a mysterious issue on Ceph 16.2.7. Every few
weeks or so (can be from 2 weeks to a month and a half), we get
input/output errors on a random OSD. Here's the logs :
2022-09-22T15:54:11.600Z syslog debug -6>
2022-09-22T15:41:05.678+0000 7fec2ebaa080 -1 bdev(0x55bf3305a800
/var/lib/ceph/osd/ceph-31/block) _sync_write sync_file_range error: (5)
Input/output error
2022-09-22T15:54:11.600Z syslog debug -2>
2022-09-22T15:54:09.918+0000 7fec1fac5700 -1 bdev(0x55bf3305a800
/var/lib/ceph/osd/ceph-31/block) _aio_thread got r=-5 ((5) Input/output
error)
2022-09-22T15:54:11.600Z syslog debug -3>
2022-09-22T15:50:54.170+0000 7fec2ebaa080 -1 bdev(0x55bf3305a800
/var/lib/ceph/osd/ceph-31/block) _sync_write sync_file_range error: (5)
Input/output error
2022-09-22T15:54:11.600Z syslog debug -4>
2022-09-22T15:47:36.678+0000 7fec2ebaa080 -1 bdev(0x55bf3305a800
/var/lib/ceph/osd/ceph-31/block) _sync_write sync_file_range error: (5)
Input/output error
2022-09-22T15:54:11.600Z syslog debug -5>
2022-09-22T15:44:22.178+0000 7fec2ebaa080 -1 bdev(0x55bf3305a800
/var/lib/ceph/osd/ceph-31/block) _sync_write sync_file_range error: (5)
Input/output error
2022-09-22T15:54:10.804Z syslog debug -3>
2022-09-22T15:50:54.170+0000 7fec2ebaa080 -1 bdev(0x55bf3305a800
/var/lib/ceph/osd/ceph-31/block) _sync_write sync_file_range error: (5)
Input/output error
2022-09-22T15:54:10.804Z syslog debug -2>
2022-09-22T15:54:09.918+0000 7fec1fac5700 -1 bdev(0x55bf3305a800
/var/lib/ceph/osd/ceph-31/block) _aio_thread got r=-5 ((5) Input/output
error)
2022-09-22T15:54:10.803Z syslog debug -4>
2022-09-22T15:47:36.678+0000 7fec2ebaa080 -1 bdev(0x55bf3305a800
/var/lib/ceph/osd/ceph-31/block) _sync_write sync_file_range error: (5)
Input/output error
2022-09-22T15:54:10.803Z syslog debug -6>
2022-09-22T15:41:05.678+0000 7fec2ebaa080 -1 bdev(0x55bf3305a800
/var/lib/ceph/osd/ceph-31/block) _sync_write sync_file_range error: (5)
Input/output error
2022-09-22T15:54:10.803Z syslog debug -5>
2022-09-22T15:44:22.178+0000 7fec2ebaa080 -1 bdev(0x55bf3305a800
/var/lib/ceph/osd/ceph-31/block) _sync_write sync_file_range error: (5)
Input/output error
2022-09-22T15:54:09.995Z syslog [19067764.996820]
blk_update_request: I/O error, dev sdf, sector 520168880 op 0x1:(WRITE)
flags 0x8800 phys_seg 1 prio class 0
2022-09-22T15:54:09.995Z syslog debug 2022-09-22T15:54:09.918+0000
7fec1fac5700 -1 bdev(0x55bf3305a800 /var/lib/ceph/osd/ceph-31/block)
_aio_thread got r=-5 ((5) Input/output error)
2022-09-22T15:54:09.977Z syslog [19067764.996688] sd 0:0:5:0:
Power-on or device reset occurred
2022-09-22T15:53:37.229Z syslog [19067732.246603] sd 0:0:5:0:
Power-on or device reset occurred
2022-09-22T15:53:04.477Z syslog [19067699.496476] sd 0:0:5:0:
Power-on or device reset occurred
2022-09-22T15:52:31.725Z syslog [19067666.746368] sd 0:0:5:0:
Power-on or device reset occurred
2022-09-22T15:51:59.080Z syslog [19067633.996243] sd 0:0:5:0:
Power-on or device reset occurred
2022-09-22T15:51:25.725Z syslog [19067600.746160] sd 0:0:5:0:
Power-on or device reset occurred
2022-09-22T15:50:54.327Z syslog debug 2022-09-22T15:50:54.170+0000
7fec2ebaa080 -1 bdev(0x55bf3305a800 /var/lib/ceph/osd/ceph-31/block)
_sync_write sync_file_range error: (5) Input/output error
2022-09-22T15:50:54.226Z syslog [19067569.246060] sd 0:0:5:0:
Power-on or device reset occurred
2022-09-22T15:50:54.226Z syslog [19067569.246209]
blk_update_request: I/O error, dev sdf, sector 461504 op 0x1:(WRITE)
flags 0x800 phys_seg 3 prio class 0
2022-09-22T15:50:18.477Z syslog [19067533.495929] sd 0:0:5:0:
Power-on or device reset occurred
2022-09-22T15:49:45.725Z syslog [19067500.745820] sd 0:0:5:0:
Power-on or device reset occurred
2022-09-22T15:49:12.977Z syslog [19067467.995714] sd 0:0:5:0:
Power-on or device reset occurred
2022-09-22T15:48:39.977Z syslog [19067434.995608] sd 0:0:5:0:
Power-on or device reset occurred
2022-09-22T15:48:08.977Z syslog [19067403.995482] sd 0:0:5:0:
Power-on or device reset occurred
2022-09-22T15:47:36.826Z syslog debug 2022-09-22T15:47:36.678+0000
7fec2ebaa080 -1 bdev(0x55bf3305a800 /var/lib/ceph/osd/ceph-31/block)
_sync_write sync_file_range error: (5) Input/output error
2022-09-22T15:47:36.725Z syslog [19067371.745553]
blk_update_request: I/O error, dev sdf, sector 460544 op 0x1:(WRITE)
flags 0x800 phys_seg 121 prio class 0
This never happens on the same OSD. When we check the drive, there's no
issue to report. When this happens, the cluster either momentarily
freeze or it will glitch and mark the OSD as out. What could be the
source of this issue? We're thinking it could be either related to the
drive model or the Ceph version. Here's some info regarding our
hardware/software:
Drives: All Intel DCS 4510 or 4610
Controller: HBA330
OS: Ubuntu 20.04 LTS
--
Jean-Philippe Méthot
Senior Openstack system administrator
Administrateur système Openstack sénior
PlanetHoster inc.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx