Hi everyone, We're tracking down a hard to reproduce failure in Ceph BlueStore where rocksdb is reading biggish chunks (e.g., 600k) and we're getting zeros in the resulting buffer, leading to a CRC failure and crash. The data has been read several times before without problems, and after the crash the correct data is on disk as well--it is a transient problem with the read result. Our main questions are if this is a known issue, or if anyone with a better understanding of the O_DIRECT block device MM interactions here has any theories as to what might be going wrong... Details below: - Kernel version is 4.10.0-42-generic (ubuntu 19.04) - pread on block-aligned extent, reading into page-aligned buffer - pread always returns the full number of bytes--it's not a short read. - O_DIRECT - Usually the zeroed bytes are at the end of the buffer (e.g., last 1/3), but not always--the most recent time we reproduced it there were 3 distinct zeroed regions in the buffer. - The zeroed regions are always 4k aligned. - We always have several other threads (3-5) also doing similar reads at different offsets of the same file/device. AFAICS they are non-overlapping extents. The one other curious thing is that we tried doing a memset on the buffer with a non-zero value before the read to see whether pread was skipping the pages or filling them with zeros...and weren't able to reproduce the failure. It's a bit hard to trigger at baseline (it takes anywhere from hours to days) so we may not have waited long enough. We're kicking off another run with memset to try again. Any theories or suggestions? Thanks! sage