Hello Ceph Users,
* Problem: we get the following errors when using krbd, we are using rbd
for vms.
* Workaround: by switching to librbd the errors disappear.
* Software:
** Kernel: 6.8.8-2 (parameters: intel_iommu=on iommu=pt
pcie_aspm.policy=performance)
** Ceph: 18.2.2
Description/Details: Errors from using krbd with ceph. Side-effects:
[Wed Aug 21 03:04:17 2024] libceph: read_partial_message
0000000015af2284 data crc 1221767919 != exp. 282251377
[Wed Aug 21 03:04:17 2024] libceph: read_partial_message
0000000066b200ab data crc 3817026135 != exp. 3925662391
[Wed Aug 21 03:04:17 2024] libceph: osd15 (1)10.1.4.13:6836 bad
crc/signature
[Wed Aug 21 03:04:17 2024] libceph: osd13 (1)10.1.4.13:6809 bad
crc/signature
[Wed Aug 21 03:04:21 2024] libceph: read_partial_message
000000008a131738 data crc 2612835980 != exp. 917302924
[Wed Aug 21 03:04:21 2024] libceph: read_partial_message
000000005160776b data crc 2965872045 != exp. 563323792
[Wed Aug 21 03:04:21 2024] libceph: osd15 (1)10.1.4.13:6836 bad
crc/signature
[Wed Aug 21 03:04:21 2024] libceph: osd6 (1)10.1.4.12:6843 bad
crc/signature
[Wed Aug 21 03:06:44 2024] libceph: read_partial_message
000000007e548354 data crc 1265032637 != exp. 2426281931
[Wed Aug 21 03:06:44 2024] libceph: osd0 (1)10.1.4.11:6835 bad
crc/signature
[Wed Aug 21 03:06:44 2024] libceph: read_partial_message
000000009214d802 data crc 2596010853 != exp. 1221875667
[Wed Aug 21 03:06:44 2024] libceph: osd10 (1)10.1.4.12:6809 bad
crc/signature
[Wed Aug 21 03:06:47 2024] libceph: read_partial_message
000000000f9edc73 data crc 1326019705 != exp. 3079604517
[Wed Aug 21 03:06:47 2024] libceph: osd3 (1)10.1.4.11:6803 bad
crc/signature
[Wed Aug 21 03:06:50 2024] libceph: read_partial_message
000000004769da61 data crc 3421275194 != exp. 4183754554
[Wed Aug 21 03:06:50 2024] libceph: osd8 (1)10.1.4.12:6835 bad
crc/signature
[Wed Aug 21 03:06:51 2024] libceph: read_partial_message
0000000044db9a59 data crc 2603270708 != exp. 4150529351
[Wed Aug 21 03:06:51 2024] libceph: osd14 (1)10.1.4.13:6806 bad
crc/signature
Description/Details 2: vms get problems with buffer i/o errors on
rbd-backed virtual disks:
Aug 24 03:16:01 de-vlix-dbix-01 kernel: sd 0:0:0:1: [sdb] tag#211 timing
out command, waited 180s
Aug 24 03:16:01 de-vlix-dbix-01 kernel: sd 0:0:0:1: [sdb] tag#39 timing
out command, waited 180s
Aug 24 03:16:01 de-vlix-dbix-01 kernel: sd 0:0:0:1: [sdb] tag#211 FAILED
Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=885s
Aug 24 03:16:01 de-vlix-dbix-01 kernel: sd 0:0:0:1: [sdb] tag#39 FAILED
Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=847s
Aug 24 03:16:01 de-vlix-dbix-01 kernel: sd 0:0:0:1: [sdb] tag#211 Sense
Key: Aborted Command [current]
Aug 24 03:16:01 de-vlix-dbix-01 kernel: sd 0:0:0:1: [sdb] tag#211 Add.
Sense: I/O process terminated
Aug 24 03:16:01 de-vlix-dbix-01 kernel: sd 0:0:0:1: [sdb] tag#39 Sense
Key: Aborted Command [current]
Aug 24 03:16:01 de-vlix-dbix-01 kernel: sd 0:0:0:1: [sdb] tag#39 Add.
Sense: I/0 process terminated
Aug 24 03:16:01 de-vlix-dbix-01 kernel: sd 0:0:0:1: [sdb] tag#211 CDB:
Write(10) 2a 00 34 87 48 08 00 00 08 00
Aug 24 03:16:01 de-vlix-dbix-01 kernel: sd 0:0:0:1: [sdb] tag#39 CDB:
Write (10) 2a 00 34 81 52 70 00 00 58 00
Aug 24 03:16:01 de-vlix-dbix-01 kernel: I/O error, dev sdb, sector
881281032 op 0x1: (WRITE) flags 0x800 phys_seg 1 prio class 0
Aug 24 03:16:01 de-vlix-dbix-01 kernel: I/O error, dev sdb, sector
880890480 op 0x1: (WRITE) flags 0x103000 phys_seg 11 prio class 0
Aug 24 03:16:01 de-vlix-dbix-01 kernel: EXT4-fs warning (device sdb1):
ext4_end_bio:343: I/0 error 10 writing to inode 27525908 starting
Aug 24 03:16:01 de-vlix-dbix-01 kernel: Buffer I/O error on dev sdbl,
logical block 110111054, lost async page write
Aug 24 03:16:01 de-vlix-dbix-01 kernel: sd 0:0:0:1: [sdb] tag#212 timing
out command, waited 180s
Aug 24 03:16:01 de-vlix-dbix-01 kernel: buffer_io_error: 21 callbacks
suppressed
Aug 24 03:16:01 de-vlix-dbix-01 kernel: Buffer 1/0 error on device sdbl,
logical block 110159873
Aug 24 03:16:01 de-vlix-dbix-01 kernel: Buffer 1/0 error on dev sdbl,
logical block 110111055, lost async page write
Aug 24 03:16:01 de-vlix-dbix-01 kernel: sd 0:0:0:1: [sdb] tag#212 FAILED
Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=875s
Aug 24 03:16:01 de-vlix-dbix-01 kernel: Buffer I/O error on dev sdb1,
logical block 110111056, lost async page write
Aug 24 03:16:01 de-vlix-dbix-01 kernel: sd 0:0:0:1: [sdb] tag#212 Sense
Key: Aborted Command [current]
Aug 24 03:16:01 de-vlix-dbix-01 kernel: Buffer I/O error on dev sdbl,
logical block 110111057, lost async page write
Aug 24 03:16:01 de-vlix-dbix-01 kernel: sd 0:0:0:1: [sdb] tag#212 Add.
Sense: I/0 process terminated
Aug 24 03:16:01 de-vlix-dbix-01 kernel: Buffer I/O error on dev sdb1,
logical block 110111058, lost async page write
Aug 24 03:16:01 de-vlix-dbix-01 kernel: sd 0:0:0:1: [sdb] tag#212 CDB:
Write(10) 2a 00 34 87 48 10 00 00 08 00
Aug 24 03:16:01 de-vlix-dbix-01 kernel: Buffer I/O error on dev sdbl,
logical block 110111059, lost async page write
Aug 24 03:16:01 de-vlix-dbix-01 kernel: I/O error, dev sdb, sector
881281040 op 0x1: (WRITE) flags 0x800 phys_seg 1 prio class 0
Aug 24 03:16:01 de-vlix-dbix-01 kernel: Buffer I/O error on dev sdbl,
logical block 110111060, lost async page write
Thanks for helping out.
Greetings Jonas
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx