On 9/22/22 19:55, J-P Methot wrote:
Hi,
We've been running into a mysterious issue on Ceph 16.2.7. Every few
weeks or so (can be from 2 weeks to a month and a half), we get
input/output errors on a random OSD. Here's the logs :
2022-09-22T15:54:11.600Z syslog debug -6>
2022-09-22T15:41:05.678+0000 7fec2ebaa080 -1 bdev(0x55bf3305a800
/var/lib/ceph/osd/ceph-31/block) _sync_write sync_file_range error: (5)
Input/output error
2022-09-22T15:54:11.600Z syslog debug -2>
2022-09-22T15:54:09.918+0000 7fec1fac5700 -1 bdev(0x55bf3305a800
/var/lib/ceph/osd/ceph-31/block) _aio_thread got r=-5 ((5) Input/output
error)
2022-09-22T15:54:11.600Z syslog debug -3>
2022-09-22T15:50:54.170+0000 7fec2ebaa080 -1 bdev(0x55bf3305a800
/var/lib/ceph/osd/ceph-31/block) _sync_write sync_file_range error: (5)
Input/output error
2022-09-22T15:54:11.600Z syslog debug -4>
2022-09-22T15:47:36.678+0000 7fec2ebaa080 -1 bdev(0x55bf3305a800
/var/lib/ceph/osd/ceph-31/block) _sync_write sync_file_range error: (5)
Input/output error
2022-09-22T15:54:11.600Z syslog debug -5>
2022-09-22T15:44:22.178+0000 7fec2ebaa080 -1 bdev(0x55bf3305a800
/var/lib/ceph/osd/ceph-31/block) _sync_write sync_file_range error: (5)
Input/output error
2022-09-22T15:54:10.804Z syslog debug -3>
2022-09-22T15:50:54.170+0000 7fec2ebaa080 -1 bdev(0x55bf3305a800
/var/lib/ceph/osd/ceph-31/block) _sync_write sync_file_range error: (5)
Input/output error
2022-09-22T15:54:10.804Z syslog debug -2>
2022-09-22T15:54:09.918+0000 7fec1fac5700 -1 bdev(0x55bf3305a800
/var/lib/ceph/osd/ceph-31/block) _aio_thread got r=-5 ((5) Input/output
error)
2022-09-22T15:54:10.803Z syslog debug -4>
2022-09-22T15:47:36.678+0000 7fec2ebaa080 -1 bdev(0x55bf3305a800
/var/lib/ceph/osd/ceph-31/block) _sync_write sync_file_range error: (5)
Input/output error
2022-09-22T15:54:10.803Z syslog debug -6>
2022-09-22T15:41:05.678+0000 7fec2ebaa080 -1 bdev(0x55bf3305a800
/var/lib/ceph/osd/ceph-31/block) _sync_write sync_file_range error: (5)
Input/output error
2022-09-22T15:54:10.803Z syslog debug -5>
2022-09-22T15:44:22.178+0000 7fec2ebaa080 -1 bdev(0x55bf3305a800
/var/lib/ceph/osd/ceph-31/block) _sync_write sync_file_range error: (5)
Input/output error
2022-09-22T15:54:09.995Z syslog [19067764.996820]
blk_update_request: I/O error, dev sdf, sector 520168880 op 0x1:(WRITE)
flags 0x8800 phys_seg 1 prio class 0
2022-09-22T15:54:09.995Z syslog debug 2022-09-22T15:54:09.918+0000
7fec1fac5700 -1 bdev(0x55bf3305a800 /var/lib/ceph/osd/ceph-31/block)
_aio_thread got r=-5 ((5) Input/output error)
2022-09-22T15:54:09.977Z syslog [19067764.996688] sd 0:0:5:0:
Power-on or device reset occurred
2022-09-22T15:53:37.229Z syslog [19067732.246603] sd 0:0:5:0:
Power-on or device reset occurred
2022-09-22T15:53:04.477Z syslog [19067699.496476] sd 0:0:5:0:
Power-on or device reset occurred
2022-09-22T15:52:31.725Z syslog [19067666.746368] sd 0:0:5:0:
Power-on or device reset occurred
2022-09-22T15:51:59.080Z syslog [19067633.996243] sd 0:0:5:0:
Power-on or device reset occurred
2022-09-22T15:51:25.725Z syslog [19067600.746160] sd 0:0:5:0:
Power-on or device reset occurred
2022-09-22T15:50:54.327Z syslog debug 2022-09-22T15:50:54.170+0000
7fec2ebaa080 -1 bdev(0x55bf3305a800 /var/lib/ceph/osd/ceph-31/block)
_sync_write sync_file_range error: (5) Input/output error
2022-09-22T15:50:54.226Z syslog [19067569.246060] sd 0:0:5:0:
Power-on or device reset occurred
2022-09-22T15:50:54.226Z syslog [19067569.246209]
blk_update_request: I/O error, dev sdf, sector 461504 op 0x1:(WRITE)
flags 0x800 phys_seg 3 prio class 0
2022-09-22T15:50:18.477Z syslog [19067533.495929] sd 0:0:5:0:
Power-on or device reset occurred
2022-09-22T15:49:45.725Z syslog [19067500.745820] sd 0:0:5:0:
Power-on or device reset occurred
2022-09-22T15:49:12.977Z syslog [19067467.995714] sd 0:0:5:0:
Power-on or device reset occurred
2022-09-22T15:48:39.977Z syslog [19067434.995608] sd 0:0:5:0:
Power-on or device reset occurred
2022-09-22T15:48:08.977Z syslog [19067403.995482] sd 0:0:5:0:
Power-on or device reset occurred
2022-09-22T15:47:36.826Z syslog debug 2022-09-22T15:47:36.678+0000
7fec2ebaa080 -1 bdev(0x55bf3305a800 /var/lib/ceph/osd/ceph-31/block)
_sync_write sync_file_range error: (5) Input/output error
2022-09-22T15:47:36.725Z syslog [19067371.745553]
blk_update_request: I/O error, dev sdf, sector 460544 op 0x1:(WRITE)
flags 0x800 phys_seg 121 prio class 0
This never happens on the same OSD. When we check the drive, there's no
issue to report. When this happens, the cluster either momentarily
freeze or it will glitch and mark the OSD as out. What could be the
source of this issue?
Just guessing here: have you configured "discard":
bdev enable discard
bdev async discard
We've see monitor slow ops when xfs was doing discard operations on the
fs. Not sure if this could result in what you are seeing on OSDs.
Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx