Nick Piggin <npiggin@xxxxxxx> writes: > On Wed, Oct 29, 2008 at 09:12:24AM -0400, Jeff Moyer wrote: >> Nick Piggin <npiggin@xxxxxxx> writes: >> >> > On Tue, Oct 28, 2008 at 05:11:02PM -0400, Jeff Moyer wrote: >> >> Nick Piggin <npiggin@xxxxxxx> writes: >> >> >> > Index: linux-2.6/mm/filemap.c >> >> > =================================================================== >> >> > --- linux-2.6.orig/mm/filemap.c 2008-10-03 11:21:31.000000000 +1000 >> >> > +++ linux-2.6/mm/filemap.c 2008-10-03 12:00:17.000000000 +1000 >> >> > @@ -1304,11 +1304,8 @@ generic_file_aio_read(struct kiocb *iocb >> >> > goto out; /* skip atime */ >> >> > size = i_size_read(inode); >> >> > if (pos < size) { >> >> > - retval = filemap_write_and_wait(mapping); >> >> > - if (!retval) { >> >> > - retval = mapping->a_ops->direct_IO(READ, iocb, >> >> > + retval = mapping->a_ops->direct_IO(READ, iocb, >> >> > iov, pos, nr_segs); >> >> > - } >> >> >> >> So why is it safe to get rid of this? Can't this result in reading >> >> stale data from disk? >> > >> > AFAIKS, __blockdev_direct_IO is doing the same thing for us, when it >> > encounters a READ. I should have documented this change. This is one >> > thing I'm not *quite* sure of there might be a path do the block device >> > that I haven't considered, and which does not do the sync... >> >> Well, that's if dio_lock_type != DIO_NO_LOCKING. cscope shows the >> following callers of blockdev_direct_IO_no_locking: >> >> gfs2_direct_IO >> ocfs2_direct_IO >> xfs_vm_direct_IO >> >> and of course >> >> blkdev_direct_IO >> >> I can't say whether all of these callers are safe. They certainly don't >> appear to be safe to me. > > Ah OK of course you're right. I'll need to take another look at that > and probably send any improvement as another patch. > > My test SMP system just started getting memory errors for some reason > so I haven't been able to boot it :( Will try to resurrect it or find > another before resending... OK, I got a kernel running on an smp system for testing. I modified your patch to do a filemap_write_and_wait_range in the read case. The aio-dio-regress test suite (with a few added programs to check for buffered vs. direct I/O) passed without problems. One of those programs did not work with your initial patch, since it opened the block device and mixed buffered and direct I/O. Cheers, Jeff diff --git a/mm/filemap.c b/mm/filemap.c index ab85536..76de63e 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1317,11 +1317,11 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov, goto out; /* skip atime */ size = i_size_read(inode); if (pos < size) { - retval = filemap_write_and_wait(mapping); - if (!retval) { + retval = filemap_write_and_wait_range(mapping, pos, + pos + iov_length(iov, nr_segs) - 1); + if (!retval) retval = mapping->a_ops->direct_IO(READ, iocb, iov, pos, nr_segs); - } if (retval > 0) *ppos = pos + retval; if (retval) { @@ -2123,18 +2123,10 @@ generic_file_direct_write(struct kiocb *iocb, const struct iovec *iov, if (count != ocount) *nr_segs = iov_shorten((struct iovec *)iov, *nr_segs, count); - /* - * Unmap all mmappings of the file up-front. - * - * This will cause any pte dirty bits to be propagated into the - * pageframes for the subsequent filemap_write_and_wait(). - */ write_len = iov_length(iov, *nr_segs); end = (pos + write_len - 1) >> PAGE_CACHE_SHIFT; - if (mapping_mapped(mapping)) - unmap_mapping_range(mapping, pos, write_len, 0); - written = filemap_write_and_wait(mapping); + written = filemap_write_and_wait_range(mapping, pos, pos + write_len - 1); if (written) goto out; @@ -2520,7 +2512,8 @@ generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov, * the file data here, to try to honour O_DIRECT expectations. */ if (unlikely(file->f_flags & O_DIRECT) && written) - status = filemap_write_and_wait(mapping); + status = filemap_write_and_wait_range(mapping, + pos, pos + written - 1); return written ? written : status; } -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html