Re: zeroed pages in pread result

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sage Weil <sweil@xxxxxxxxxx> writes:

> Hi everyone,
>
> We're tracking down a hard to reproduce failure in Ceph BlueStore where 
> rocksdb is reading biggish chunks (e.g., 600k) and we're getting zeros in 
> the resulting buffer, leading to a CRC failure and crash.  The data has 
> been read several times before without problems, and after the crash the 
> correct data is on disk as well--it is a transient problem with the read 
> result.
>
> Our main questions are if this is a known issue, or if anyone with a 
> better understanding of the O_DIRECT block device MM interactions here has 
> any theories as to what might be going wrong...
>
> Details below:
>
> - Kernel version is 4.10.0-42-generic (ubuntu 19.04)
>
> - pread on block-aligned extent, reading into page-aligned buffer
>
> - pread always returns the full number of bytes--it's not a short read.
>
> - O_DIRECT

Is there any other I/O going on to these files?  Are there concurrent
readers and writers?  Is there a mix of buffered and direct I/O?  Does
the code that performs the I/O fork()?

-Jeff

>
> - Usually the zeroed bytes are at the end of the buffer (e.g., last 1/3), 
> but not always--the most recent time we reproduced it there were 3 
> distinct zeroed regions in the buffer.
>
> - The zeroed regions are always 4k aligned.
>
> - We always have several other threads (3-5) also doing similar reads at 
> different offsets of the same file/device.  AFAICS they are 
> non-overlapping extents.
>
> The one other curious thing is that we tried doing a memset on the buffer 
> with a non-zero value before the read to see whether pread was skipping 
> the pages or filling them with zeros...and weren't able to reproduce the 
> failure.  It's a bit hard to trigger at baseline (it takes anywhere from 
> hours to days) so we may not have waited long enough.  We're kicking off 
> another run with memset to try again.
>
> Any theories or suggestions?
>
> Thanks!
> sage



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux