Spurious Read Errors: 0x6706be76

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



In the week since upgrading one of our clusters from Nautilus 14.2.21 to Pacific 16.2.4 I've seen four spurious read errors that always have the same bad checksum of 0x6706be76. I've never seen this in any of our clusters before. Here's an example of what I'm seeing in the logs:

ceph-osd.132.log:2021-06-20T22:53:20.584-0400 7fde2e4fc700 -1 bluestore(/var/lib/ceph/osd/ceph-132) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x6706be76, expected 0xee74a56a, device location [0x18c81b40000~1000], logical extent 0x200000~1000, object #29:2d8210bf:::rbd_data.94f4232ae8944a.0000000000026c57:head#

I'm not seeing any indication of inconsistent PGs, only the spurious read error. I don't see an explicit indication of a retry in the logs following the above message. Bluestore code to retry three times was introduced in 2018 following a similar issue with the same checksum: https://tracker.ceph.com/issues/22464

Here's an example of what my health detail looks like:

HEALTH_WARN 1 OSD(s) have spurious read errors [WRN] BLUESTORE_SPURIOUS_READ_ERRORS: 1 OSD(s) have spurious read errors
     osd.117  reads with retries: 1

I followed this (unresolved) thread, too: https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/DRBVFQLZ5ZYMNPKLAWS5AR4Z2MJQYLLC/

I do have swap enabled, but I don't think memory pressure is an issue with 30GB available out of 96GB (and no sign I've been close to summoning the OOMkiller). The OSDs that have thrown the cluster into HEALTH_WARN with the spurious read errors are busy 12TB rotational HDDs and I _think_ it's only happening during a deep scrub. We're on Ubuntu 18.04; uname: 5.4.0-74-generic #83~18.04.1-Ubuntu SMP Tue May 11 16:01:00 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux.

Does Pacific retry three times on a spurious read error? Would I see an indication of a retry in the logs?

Thanks!

~Jay
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux