Hi Daniel,
translating EIO to upper layers rather than crashing an OSD is a valid
default behavior. One can alter this by setting bluestore_fail_eio
parameter to true.
Thanks,
Igor
On 3/19/2024 2:50 PM, Daniel Schreiber wrote:
Hi,
in our cluster (17.2.6) disks fail from time to time. Block devices
are HDD, DB devices are NVME. However, the OSD process does not
reliably die. That leads to blocked client IO for all requests for
which the OSD with the broken disk is the primary OSD. All pools on
these OSDs are EC pools (cephfs data or rbd data). Client IO recovers
if I manually stop the OSD.
It seems like the error was triggered during deep scrub, because the
cluster reported scrub errors afterwards.
OSD Log:
2024-03-11T20:12:43+01:00 urzceph1-osd05 bash[9695]: debug
2024-03-11T19:12:43.392+0000 7fe4cad3f700 4 rocksdb: (Original Log
Time 2024/03/11-19:12:43.395747)
[db/db_impl/db_impl_compaction_flush.cc:2818] Compaction nothing to do
2024-03-11T20:15:58+01:00 urzceph1-osd05 bash[9575]: debug
2024-03-11T19:15:58.285+0000 7f9182765700 -1 bdev(0x55f72b8af800
/var/lib/ceph/osd/ceph-17/block) _aio_thread got r=-5 ((5)
Input/output error)
2024-03-11T20:15:58+01:00 urzceph1-osd05 bash[9575]: debug
2024-03-11T19:15:58.289+0000 7f9182765700 -1 bdev(0x55f72b8af800
/var/lib/ceph/osd/ceph-17/block) _aio_thread translating the error to
EIO for upper layer
2024-03-11T20:15:58+01:00 urzceph1-osd05 bash[9575]: debug
2024-03-11T19:15:58.289+0000 7f9182765700 -1 bdev(0x55f72b8af800
/var/lib/ceph/osd/ceph-17/block) _aio_thread got r=-5 ((5)
Input/output error)
2024-03-11T20:15:58+01:00 urzceph1-osd05 bash[9575]: debug
2024-03-11T19:15:58.289+0000 7f9182765700 -1 bdev(0x55f72b8af800
/var/lib/ceph/osd/ceph-17/block) _aio_thread translating the error to
EIO for upper layer
2024-03-11T20:17:02+01:00 urzceph1-osd05 bash[10152]: debug
2024-03-11T19:17:02.357+0000 7fcffadf4700 4 rocksdb:
[db/db_impl/db_impl_write.cc:1736] [L] New memtable created with log
file: #73918. Immutable memtables: 0.
Kernel Log:
[Mon Mar 11 20:15:43 2024] ata9.00: exception Emask 0x0 SAct
0xffffffff SErr 0xc0000 action 0x0
[Mon Mar 11 20:15:43 2024] ata9.00: irq_stat 0x40000008
[Mon Mar 11 20:15:43 2024] ata9: SError: { CommWake 10B8B }
[Mon Mar 11 20:15:43 2024] ata9.00: failed command: READ FPDMA QUEUED
[Mon Mar 11 20:15:43 2024] ata9.00: cmd
60/f8:38:60:b2:8e/00:00:37:00:00/40 tag 7 ncq dma 126976 in
res
43/40:f0:68:b2:8e/00:00:37:00:00/40 Emask 0x409 (media error) <F>
[Mon Mar 11 20:15:43 2024] ata9.00: status: { DRDY SENSE ERR }
[Mon Mar 11 20:15:43 2024] ata9.00: error: { UNC }
[Mon Mar 11 20:15:43 2024] ata9: hard resetting link
[Mon Mar 11 20:15:43 2024] ata9: SATA link up 6.0 Gbps (SStatus 133
SControl 300)
[Mon Mar 11 20:15:43 2024] ata9.00: configured for UDMA/133
[Mon Mar 11 20:15:43 2024] sd 8:0:0:0: [sdj] tag#7 FAILED Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=3s
[Mon Mar 11 20:15:43 2024] sd 8:0:0:0: [sdj] tag#7 Sense Key : Medium
Error [current]
[Mon Mar 11 20:15:43 2024] sd 8:0:0:0: [sdj] tag#7 Add. Sense:
Unrecovered read error - auto reallocate failed
[Mon Mar 11 20:15:43 2024] sd 8:0:0:0: [sdj] tag#7 CDB: Read(16) 88 00
00 00 00 00 37 8e b2 60 00 00 00 f8 00 00
[Mon Mar 11 20:15:43 2024] blk_update_request: I/O error, dev sdj,
sector 932098664 op 0x0:(READ) flags 0x0 phys_seg 29 prio class 0
[Mon Mar 11 20:15:43 2024] ata9: EH complete
Is this expected behavior or a bug? If it is expected how can we keep
client IO flowing?
Kind regards,
Daniel
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
--
Igor Fedotov
Ceph Lead Developer
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx