On Tue, 10 Apr 2018, Matthew Wilcox wrote: > On Tue, Apr 10, 2018 at 06:54:58PM +0000, Sage Weil wrote: > > The one other curious thing is that we tried doing a memset on the buffer > > with a non-zero value before the read to see whether pread was skipping > > the pages or filling them with zeros...and weren't able to reproduce the > > failure. It's a bit hard to trigger at baseline (it takes anywhere from > > hours to days) so we may not have waited long enough. We're kicking off > > another run with memset to try again. > > > > Any theories or suggestions? > > That makes me wonder if the user pages were paged out, and there's some > kind of bug getting those pages back into memory ... > > I assume you're reading into anonymous memory and not something weird > like device memory. But maybe you're reading into a MAP_SHARED of a file? The buffer has just been allocated with posix_memalign(3); I'm not actually sure what it's doing under the hood. We can try reproducing with swap off to rule that out... sage