Re: zeroed pages in pread result

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 10 Apr 2018, Jeff Moyer wrote:
> Sage Weil <sweil@xxxxxxxxxx> writes:
> 
> > Hi everyone,
> >
> > We're tracking down a hard to reproduce failure in Ceph BlueStore where 
> > rocksdb is reading biggish chunks (e.g., 600k) and we're getting zeros in 
> > the resulting buffer, leading to a CRC failure and crash.  The data has 
> > been read several times before without problems, and after the crash the 
> > correct data is on disk as well--it is a transient problem with the read 
> > result.
> >
> > Our main questions are if this is a known issue, or if anyone with a 
> > better understanding of the O_DIRECT block device MM interactions here has 
> > any theories as to what might be going wrong...
> >
> > Details below:
> >
> > - Kernel version is 4.10.0-42-generic (ubuntu 19.04)
> >
> > - pread on block-aligned extent, reading into page-aligned buffer
> >
> > - pread always returns the full number of bytes--it's not a short read.
> >
> > - O_DIRECT
> 
> Is there any other I/O going on to these files?  Are there concurrent
> readers and writers?  Is there a mix of buffered and direct I/O?  Does
> the code that performs the I/O fork()?

Several (3-5) other threads are doing concurrent O_DIRECT reads from the 
same fd at different offsets.

No writers to this region of the device (but there are O_DIRECT writes 
going on elsewhere).  I've already verified they aren't touching these 
ranges (and if they did we would expect to see the on-disk state change, 
but after the failure the device's data is all correct).

All IO is O_DIRECT--nothing buffered (at least not in this process).

No fork(), just multiple pthreads.

sage


> 
> -Jeff
> 
> >
> > - Usually the zeroed bytes are at the end of the buffer (e.g., last 1/3), 
> > but not always--the most recent time we reproduced it there were 3 
> > distinct zeroed regions in the buffer.
> >
> > - The zeroed regions are always 4k aligned.
> >
> > - We always have several other threads (3-5) also doing similar reads at 
> > different offsets of the same file/device.  AFAICS they are 
> > non-overlapping extents.
> >
> > The one other curious thing is that we tried doing a memset on the buffer 
> > with a non-zero value before the read to see whether pread was skipping 
> > the pages or filling them with zeros...and weren't able to reproduce the 
> > failure.  It's a bit hard to trigger at baseline (it takes anywhere from 
> > hours to days) so we may not have waited long enough.  We're kicking off 
> > another run with memset to try again.
> >
> > Any theories or suggestions?
> >
> > Thanks!
> > sage
> 
> 



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux