Freak issue every few weeks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

We've been running into a mysterious issue on Ceph 16.2.7. Every few weeks or so (can be from 2 weeks to a month and a half), we get input/output errors on a random OSD. Here's the logs :

2022-09-22T15:54:11.600Z    syslog    debug     -6> 2022-09-22T15:41:05.678+0000 7fec2ebaa080 -1 bdev(0x55bf3305a800 /var/lib/ceph/osd/ceph-31/block) _sync_write sync_file_range error: (5) Input/output error 2022-09-22T15:54:11.600Z    syslog    debug     -2> 2022-09-22T15:54:09.918+0000 7fec1fac5700 -1 bdev(0x55bf3305a800 /var/lib/ceph/osd/ceph-31/block) _aio_thread got r=-5 ((5) Input/output error) 2022-09-22T15:54:11.600Z    syslog    debug     -3> 2022-09-22T15:50:54.170+0000 7fec2ebaa080 -1 bdev(0x55bf3305a800 /var/lib/ceph/osd/ceph-31/block) _sync_write sync_file_range error: (5) Input/output error 2022-09-22T15:54:11.600Z    syslog    debug     -4> 2022-09-22T15:47:36.678+0000 7fec2ebaa080 -1 bdev(0x55bf3305a800 /var/lib/ceph/osd/ceph-31/block) _sync_write sync_file_range error: (5) Input/output error 2022-09-22T15:54:11.600Z    syslog    debug     -5> 2022-09-22T15:44:22.178+0000 7fec2ebaa080 -1 bdev(0x55bf3305a800 /var/lib/ceph/osd/ceph-31/block) _sync_write sync_file_range error: (5) Input/output error 2022-09-22T15:54:10.804Z    syslog    debug     -3> 2022-09-22T15:50:54.170+0000 7fec2ebaa080 -1 bdev(0x55bf3305a800 /var/lib/ceph/osd/ceph-31/block) _sync_write sync_file_range error: (5) Input/output error 2022-09-22T15:54:10.804Z    syslog    debug     -2> 2022-09-22T15:54:09.918+0000 7fec1fac5700 -1 bdev(0x55bf3305a800 /var/lib/ceph/osd/ceph-31/block) _aio_thread got r=-5 ((5) Input/output error) 2022-09-22T15:54:10.803Z    syslog    debug     -4> 2022-09-22T15:47:36.678+0000 7fec2ebaa080 -1 bdev(0x55bf3305a800 /var/lib/ceph/osd/ceph-31/block) _sync_write sync_file_range error: (5) Input/output error 2022-09-22T15:54:10.803Z    syslog    debug     -6> 2022-09-22T15:41:05.678+0000 7fec2ebaa080 -1 bdev(0x55bf3305a800 /var/lib/ceph/osd/ceph-31/block) _sync_write sync_file_range error: (5) Input/output error 2022-09-22T15:54:10.803Z    syslog    debug     -5> 2022-09-22T15:44:22.178+0000 7fec2ebaa080 -1 bdev(0x55bf3305a800 /var/lib/ceph/osd/ceph-31/block) _sync_write sync_file_range error: (5) Input/output error 2022-09-22T15:54:09.995Z    syslog    [19067764.996820] blk_update_request: I/O error, dev sdf, sector 520168880 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0 2022-09-22T15:54:09.995Z    syslog    debug 2022-09-22T15:54:09.918+0000 7fec1fac5700 -1 bdev(0x55bf3305a800 /var/lib/ceph/osd/ceph-31/block) _aio_thread got r=-5 ((5) Input/output error) 2022-09-22T15:54:09.977Z    syslog    [19067764.996688] sd 0:0:5:0: Power-on or device reset occurred 2022-09-22T15:53:37.229Z    syslog    [19067732.246603] sd 0:0:5:0: Power-on or device reset occurred 2022-09-22T15:53:04.477Z    syslog    [19067699.496476] sd 0:0:5:0: Power-on or device reset occurred 2022-09-22T15:52:31.725Z    syslog    [19067666.746368] sd 0:0:5:0: Power-on or device reset occurred 2022-09-22T15:51:59.080Z    syslog    [19067633.996243] sd 0:0:5:0: Power-on or device reset occurred 2022-09-22T15:51:25.725Z    syslog    [19067600.746160] sd 0:0:5:0: Power-on or device reset occurred 2022-09-22T15:50:54.327Z    syslog    debug 2022-09-22T15:50:54.170+0000 7fec2ebaa080 -1 bdev(0x55bf3305a800 /var/lib/ceph/osd/ceph-31/block) _sync_write sync_file_range error: (5) Input/output error 2022-09-22T15:50:54.226Z    syslog    [19067569.246060] sd 0:0:5:0: Power-on or device reset occurred 2022-09-22T15:50:54.226Z    syslog    [19067569.246209] blk_update_request: I/O error, dev sdf, sector 461504 op 0x1:(WRITE) flags 0x800 phys_seg 3 prio class 0 2022-09-22T15:50:18.477Z    syslog    [19067533.495929] sd 0:0:5:0: Power-on or device reset occurred 2022-09-22T15:49:45.725Z    syslog    [19067500.745820] sd 0:0:5:0: Power-on or device reset occurred 2022-09-22T15:49:12.977Z    syslog    [19067467.995714] sd 0:0:5:0: Power-on or device reset occurred 2022-09-22T15:48:39.977Z    syslog    [19067434.995608] sd 0:0:5:0: Power-on or device reset occurred 2022-09-22T15:48:08.977Z    syslog    [19067403.995482] sd 0:0:5:0: Power-on or device reset occurred 2022-09-22T15:47:36.826Z    syslog    debug 2022-09-22T15:47:36.678+0000 7fec2ebaa080 -1 bdev(0x55bf3305a800 /var/lib/ceph/osd/ceph-31/block) _sync_write sync_file_range error: (5) Input/output error 2022-09-22T15:47:36.725Z    syslog    [19067371.745553] blk_update_request: I/O error, dev sdf, sector 460544 op 0x1:(WRITE) flags 0x800 phys_seg 121 prio class 0

This never happens on the same OSD. When we check the drive, there's no issue to report. When this happens, the cluster either momentarily freeze or it will glitch and mark the OSD as out. What could be the source of this issue? We're thinking it could be either related to the drive model or the Ceph version. Here's some info regarding our hardware/software:


Drives: All Intel DCS 4510 or 4610

Controller: HBA330

OS: Ubuntu 20.04 LTS

--
Jean-Philippe Méthot
Senior Openstack system administrator
Administrateur système Openstack sénior
PlanetHoster inc.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux