Re: [rfc][patch] mm: direct io less aggressive syncs and invalidates

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nick Piggin <npiggin@xxxxxxx> writes:

> On Wed, Oct 29, 2008 at 09:12:24AM -0400, Jeff Moyer wrote:
>> Nick Piggin <npiggin@xxxxxxx> writes:
>> 
>> > On Tue, Oct 28, 2008 at 05:11:02PM -0400, Jeff Moyer wrote:
>> >> Nick Piggin <npiggin@xxxxxxx> writes:
>> 
>> >> > Index: linux-2.6/mm/filemap.c
>> >> > ===================================================================
>> >> > --- linux-2.6.orig/mm/filemap.c	2008-10-03 11:21:31.000000000 +1000
>> >> > +++ linux-2.6/mm/filemap.c	2008-10-03 12:00:17.000000000 +1000
>> >> > @@ -1304,11 +1304,8 @@ generic_file_aio_read(struct kiocb *iocb
>> >> >  			goto out; /* skip atime */
>> >> >  		size = i_size_read(inode);
>> >> >  		if (pos < size) {
>> >> > -			retval = filemap_write_and_wait(mapping);
>> >> > -			if (!retval) {
>> >> > -				retval = mapping->a_ops->direct_IO(READ, iocb,
>> >> > +			retval = mapping->a_ops->direct_IO(READ, iocb,
>> >> >  							iov, pos, nr_segs);
>> >> > -			}
>> >> 
>> >> So why is it safe to get rid of this?  Can't this result in reading
>> >> stale data from disk?
>> >
>> > AFAIKS, __blockdev_direct_IO is doing the same thing for us, when it
>> > encounters a READ. I should have documented this change. This is one
>> > thing I'm not *quite* sure of there  might be a path do the block device
>> > that I haven't considered, and which does not do the sync...
>> 
>> Well, that's if dio_lock_type != DIO_NO_LOCKING.  cscope shows the
>> following callers of blockdev_direct_IO_no_locking:
>> 
>> gfs2_direct_IO
>> ocfs2_direct_IO
>> xfs_vm_direct_IO
>> 
>> and of course
>> 
>> blkdev_direct_IO
>> 
>> I can't say whether all of these callers are safe.  They certainly don't
>> appear to be safe to me.
>
> Ah OK of course you're right. I'll need to take another look at that
> and probably send any improvement as another patch.
>
> My test SMP system just started getting memory errors for some reason
> so I haven't been able to boot it :( Will try to resurrect it or find
> another before resending...

OK, I got a kernel running on an smp system for testing.  I modified
your patch to do a filemap_write_and_wait_range in the read case.  The
aio-dio-regress test suite (with a few added programs to check for
buffered vs. direct I/O) passed without problems.  One of those programs
did not work with your initial patch, since it opened the block device
and mixed buffered and direct I/O.

Cheers,

Jeff

diff --git a/mm/filemap.c b/mm/filemap.c
index ab85536..76de63e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1317,11 +1317,11 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov,
 			goto out; /* skip atime */
 		size = i_size_read(inode);
 		if (pos < size) {
-			retval = filemap_write_and_wait(mapping);
-			if (!retval) {
+			retval = filemap_write_and_wait_range(mapping, pos,
+					pos + iov_length(iov, nr_segs) - 1);
+			if (!retval)
 				retval = mapping->a_ops->direct_IO(READ, iocb,
 							iov, pos, nr_segs);
-			}
 			if (retval > 0)
 				*ppos = pos + retval;
 			if (retval) {
@@ -2123,18 +2123,10 @@ generic_file_direct_write(struct kiocb *iocb, const struct iovec *iov,
 	if (count != ocount)
 		*nr_segs = iov_shorten((struct iovec *)iov, *nr_segs, count);
 
-	/*
-	 * Unmap all mmappings of the file up-front.
-	 *
-	 * This will cause any pte dirty bits to be propagated into the
-	 * pageframes for the subsequent filemap_write_and_wait().
-	 */
 	write_len = iov_length(iov, *nr_segs);
 	end = (pos + write_len - 1) >> PAGE_CACHE_SHIFT;
-	if (mapping_mapped(mapping))
-		unmap_mapping_range(mapping, pos, write_len, 0);
 
-	written = filemap_write_and_wait(mapping);
+	written = filemap_write_and_wait_range(mapping, pos, pos + write_len - 1);
 	if (written)
 		goto out;
 
@@ -2520,7 +2512,8 @@ generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov,
 	 * the file data here, to try to honour O_DIRECT expectations.
 	 */
 	if (unlikely(file->f_flags & O_DIRECT) && written)
-		status = filemap_write_and_wait(mapping);
+		status = filemap_write_and_wait_range(mapping,
+					pos, pos + written - 1);
 
 	return written ? written : status;
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux