Re: [PATCH 6/5 V2] iomap: readpages doesn't zero page tail beyond EOF properly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 21, 2018 at 07:27:36PM +1100, Dave Chinner wrote:
> 
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> When we read the EOF page of the file via readpages, we need
> to zero the region beyond EOF that we either do not read or
> should not contain data so that mmap does not expose stale data to
> user applications.
> 
> However, iomap_adjust_read_range() fails to detect EOF correctly,
> and so fsx on 1k block size filesystems fails very quickly with
> mapreads exposing data beyond EOF. There are two problems here.
> 
> Firstly, when calculating the end block of the EOF byte, we have
> to round the size by one to avoid a block aligned EOF from reporting
> a block too large. i.e. a size of 1024 bytes is 1 block, which in
> index terms is block 0. Therefore we have to calculate the end block
> from (isize - 1), not isize.
> 
> The second bug is determining if the current page spans EOF, and so
> whether we need split it into two half, one for the IO, and the
> other for zeroing. Unfortunately, the code that checks whether
> we should split the block doesn't actually check if we span EOF, it
> just checks if the read spans the /offset in the page/ that EOF
> sits on. So it splits every read into two if EOF is not page
> aligned, regardless of whether we are reading the EOF block or not.
> 
> Hence we need to restrict the "does the read span EOF" check to
> just the page that spans EOF, not every page we read.
> 
> This patch results in correct EOF detection through readpages:
> 
> xfs_vm_readpages:     dev 259:0 ino 0x43 nr_pages 24
> xfs_iomap_found:      dev 259:0 ino 0x43 size 0x66c00 offset 0x4f000 count 98304 type hole startoff 0x13c startblock 1368 blockcount 0x4
> iomap_readpage_actor: orig pos 323584 pos 323584, length 4096, poff 0 plen 4096, isize 420864
> xfs_iomap_found:      dev 259:0 ino 0x43 size 0x66c00 offset 0x50000 count 94208 type hole startoff 0x140 startblock 1497 blockcount 0x5c
> iomap_readpage_actor: orig pos 327680 pos 327680, length 94208, poff 0 plen 4096, isize 420864
> iomap_readpage_actor: orig pos 331776 pos 331776, length 90112, poff 0 plen 4096, isize 420864
> iomap_readpage_actor: orig pos 335872 pos 335872, length 86016, poff 0 plen 4096, isize 420864
> iomap_readpage_actor: orig pos 339968 pos 339968, length 81920, poff 0 plen 4096, isize 420864
> iomap_readpage_actor: orig pos 344064 pos 344064, length 77824, poff 0 plen 4096, isize 420864
> iomap_readpage_actor: orig pos 348160 pos 348160, length 73728, poff 0 plen 4096, isize 420864
> iomap_readpage_actor: orig pos 352256 pos 352256, length 69632, poff 0 plen 4096, isize 420864
> iomap_readpage_actor: orig pos 356352 pos 356352, length 65536, poff 0 plen 4096, isize 420864
> iomap_readpage_actor: orig pos 360448 pos 360448, length 61440, poff 0 plen 4096, isize 420864
> iomap_readpage_actor: orig pos 364544 pos 364544, length 57344, poff 0 plen 4096, isize 420864
> iomap_readpage_actor: orig pos 368640 pos 368640, length 53248, poff 0 plen 4096, isize 420864
> iomap_readpage_actor: orig pos 372736 pos 372736, length 49152, poff 0 plen 4096, isize 420864
> iomap_readpage_actor: orig pos 376832 pos 376832, length 45056, poff 0 plen 4096, isize 420864
> iomap_readpage_actor: orig pos 380928 pos 380928, length 40960, poff 0 plen 4096, isize 420864
> iomap_readpage_actor: orig pos 385024 pos 385024, length 36864, poff 0 plen 4096, isize 420864
> iomap_readpage_actor: orig pos 389120 pos 389120, length 32768, poff 0 plen 4096, isize 420864
> iomap_readpage_actor: orig pos 393216 pos 393216, length 28672, poff 0 plen 4096, isize 420864
> iomap_readpage_actor: orig pos 397312 pos 397312, length 24576, poff 0 plen 4096, isize 420864
> iomap_readpage_actor: orig pos 401408 pos 401408, length 20480, poff 0 plen 4096, isize 420864
> iomap_readpage_actor: orig pos 405504 pos 405504, length 16384, poff 0 plen 4096, isize 420864
> iomap_readpage_actor: orig pos 409600 pos 409600, length 12288, poff 0 plen 4096, isize 420864
> iomap_readpage_actor: orig pos 413696 pos 413696, length 8192, poff 0 plen 4096, isize 420864
> iomap_readpage_actor: orig pos 417792 pos 417792, length 4096, poff 0 plen 3072, isize 420864
> iomap_readpage_actor: orig pos 420864 pos 420864, length 1024, poff 3072 plen 1024, isize 420864
> 
> As you can see, it now does full page reads until the last one which
> is split correctly at the block aligned EOF, reading 3072 bytes and
> zeroing the last 1024 bytes. The original version of the patch got
> this right, but it got another case wrong.
> 
> The EOF detection crossing really needs to the the original length
> as plen, while it starts at the end of the block, will be shortened
> as up-to-date blocks are found on the page. This means "orig_pos +
> plen" no longer points to the end of the page, and so will not
> correctly detect EOF crossing. Hence we have to use the length
> passed in to detect this partial page case:
> 
> xfs_filemap_fault:    dev 259:1 ino 0x43  write_fault 0
> xfs_vm_readpage:      dev 259:1 ino 0x43 nr_pages 1
> xfs_iomap_found:      dev 259:1 ino 0x43 size 0x2cc00 offset 0x2c000 count 4096 type hole startoff 0xb0 startblock 282 blockcount 0x4
> iomap_readpage_actor: orig pos 180224 pos 181248, length 4096, poff 1024 plen 2048, isize 183296
> xfs_iomap_found:      dev 259:1 ino 0x43 size 0x2cc00 offset 0x2cc00 count 1024 type hole startoff 0xb3 startblock 285 blockcount 0x1
> iomap_readpage_actor: orig pos 183296 pos 183296, length 1024, poff 3072 plen 1024, isize 183296
> 
> Heere we see a trace where the first block on the EOF page is up to
> date, hence poff = 1024 bytes. The offset into the page of EOF is
> 3072, so the range we want to read is 1024 - 3071, and the range we
> want to zero is 3072 - 4095. You can see this is split correctly
> now.
>  
> This fixes the stale data beyond EOF problem that fsx quickly
> uncovers on 1k block size filesystems.
> 
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>

Looks ok, will test...
Reviewed-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>

--D

> ---
> 
> Testing the first version of this patch failed the direct IO fsx
> test case I've been using at ~16.5 million ops when it tripped over
> a partially up to date EOF page. I used the wrong length to detect
> EOF being crossed, this update fixes that and it no longer fails at
> 16.5m ops. We'll see how long it lasts this time....
> 
>  fs/iomap.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/iomap.c b/fs/iomap.c
> index d51e7a2ae641..b95110867eee 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -142,13 +142,14 @@ static void
>  iomap_adjust_read_range(struct inode *inode, struct iomap_page *iop,
>  		loff_t *pos, loff_t length, unsigned *offp, unsigned *lenp)
>  {
> +	loff_t orig_pos = *pos;
> +	loff_t isize = i_size_read(inode);
>  	unsigned block_bits = inode->i_blkbits;
>  	unsigned block_size = (1 << block_bits);
>  	unsigned poff = offset_in_page(*pos);
>  	unsigned plen = min_t(loff_t, PAGE_SIZE - poff, length);
>  	unsigned first = poff >> block_bits;
>  	unsigned last = (poff + plen - 1) >> block_bits;
> -	unsigned end = offset_in_page(i_size_read(inode)) >> block_bits;
>  
>  	/*
>  	 * If the block size is smaller than the page size we need to check the
> @@ -183,8 +184,11 @@ iomap_adjust_read_range(struct inode *inode, struct iomap_page *iop,
>  	 * handle both halves separately so that we properly zero data in the
>  	 * page cache for blocks that are entirely outside of i_size.
>  	 */
> -	if (first <= end && last > end)
> -		plen -= (last - end) * block_size;
> +	if (orig_pos <= isize && orig_pos + length > isize) {
> +		unsigned end = offset_in_page(isize - 1) >> block_bits;
> +		if (first <= end && last > end)
> +			plen -= (last - end) * block_size;
> +	}
>  
>  	*offp = poff;
>  	*lenp = plen;



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux