On Tue, Apr 10, 2018 at 06:54:58PM +0000, Sage Weil wrote: > The one other curious thing is that we tried doing a memset on the buffer > with a non-zero value before the read to see whether pread was skipping > the pages or filling them with zeros...and weren't able to reproduce the > failure. It's a bit hard to trigger at baseline (it takes anywhere from > hours to days) so we may not have waited long enough. We're kicking off > another run with memset to try again. > > Any theories or suggestions? That makes me wonder if the user pages were paged out, and there's some kind of bug getting those pages back into memory ... I assume you're reading into anonymous memory and not something weird like device memory. But maybe you're reading into a MAP_SHARED of a file?