+ direct-io-fix-direct-write-stale-data-exposure-from-concurrent-buffered-read.patch added to -mm tree

akpm@xxxxxxxxxxxxxxxxxxxx · Tue, 24 May 2016 12:24:58 -0700

The patch titled
     Subject: direct-io: fix direct write stale data exposure from concurrent buffered read
has been added to the -mm tree.  Its filename is
     direct-io-fix-direct-write-stale-data-exposure-from-concurrent-buffered-read.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/direct-io-fix-direct-write-stale-data-exposure-from-concurrent-buffered-read.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/direct-io-fix-direct-write-stale-data-exposure-from-concurrent-buffered-read.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/SubmitChecklist when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Eryu Guan <guaneryu@xxxxxxxxx>
Subject: direct-io: fix direct write stale data exposure from concurrent buffered read

Currently direct writes inside i_size on a DIO_SKIP_HOLES filesystem are
not allowed to allocate blocks(get_more_blocks() sets 'create' to 0 before
calling get_block() callback), if it's a sparse file, direct writes fall
back to buffered writes to avoid stale data exposure from concurrent
buffered read.  But there're two cases that can result in stale data
exposure are not correctly detected.

1. The detection for "writing inside i_size" is not sufficient, writes
   can be treated as "extending writes" wrongly.  For example, direct
   write 1FSB to a 1FSB sparse file on ext2/3/4, starting from offset 0,
   in this case it's writing inside i_size, but 'create' is non-zero,
   because 'block_in_file' and '(i_size_read(inode) >> blkbits' are both
   zero.

2. Direct writes starting from or beyong i_size (not inside i_size)
   also could trigger block allocation and expose stale data.  For
   example, consider a sparse file with i_size of 2k, and a write to
   offset 2k or 3k into the file, with a filesystem block size of 4k. 
   (Thanks to Jeff Moyer for pointing this case out in his review.)

The first problem can be demostrated by running ltp-aiodio test ADSP045
many times.  When testing on extN filesystems, I see test failures
occasionally, buffered read could read non-zero (stale) data.

ADSP045: dio_sparse -a 4k -w 4k -s 2k -n 1

dio_sparse    0  TINFO  :  Dirtying free blocks
dio_sparse    0  TINFO  :  Starting I/O tests
non zero buffer at buf[0] => 0xffffffaa,ffffffaa,ffffffaa,ffffffaa
non-zero read at offset 0
dio_sparse    0  TINFO  :  Killing childrens(s)
dio_sparse    1  TFAIL  :  dio_sparse.c:191: 1 children(s) exited abnormally

The second problem can also be reproduced easily by a hacked dio_sparse
program, which accepts an option to specify the write offset.

What we should really do is to disable block allocation for writes that
could result in filling holes inside i_size.

Link: http://lkml.kernel.org/r/1463156728-13357-1-git-send-email-guaneryu@xxxxxxxxx
Reviewed-by: Jan Kara <jack@xxxxxxx>
Signed-off-by: Eryu Guan <guaneryu@xxxxxxxxx>
Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 fs/direct-io.c |   14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff -puN fs/direct-io.c~direct-io-fix-direct-write-stale-data-exposure-from-concurrent-buffered-read fs/direct-io.c

--- a/fs/direct-io.c~direct-io-fix-direct-write-stale-data-exposure-from-concurrent-buffered-read
+++ a/fs/direct-io.c
@@ -628,11 +628,11 @@ static int get_more_blocks(struct dio *d
 		map_bh->b_size = fs_count << i_blkbits;
 
 		/*
-		 * For writes inside i_size on a DIO_SKIP_HOLES filesystem we
-		 * forbid block creations: only overwrites are permitted.
-		 * We will return early to the caller once we see an
-		 * unmapped buffer head returned, and the caller will fall
-		 * back to buffered I/O.
+		 * For writes that could fill holes inside i_size on a
+		 * DIO_SKIP_HOLES filesystem we forbid block creations: only
+		 * overwrites are permitted. We will return early to the caller
+		 * once we see an unmapped buffer head returned, and the caller
+		 * will fall back to buffered I/O.
 		 *
 		 * Otherwise the decision is left to the get_blocks method,
 		 * which may decide to handle it or also return an unmapped
@@ -640,8 +640,8 @@ static int get_more_blocks(struct dio *d
 		 */
 		create = dio->rw & WRITE;
 		if (dio->flags & DIO_SKIP_HOLES) {
-			if (sdio->block_in_file < (i_size_read(dio->inode) >>
-							sdio->blkbits))
+			if (fs_startblk <= ((i_size_read(dio->inode) - 1) >>
+							i_blkbits))
 				create = 0;
 		}
 
_

Patches currently in -mm which might be from guaneryu@xxxxxxxxx are

direct-io-fix-direct-write-stale-data-exposure-from-concurrent-buffered-read.patch

--
To unsubscribe from this list: send the line "unsubscribe mm-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html