On Tue, Oct 28, 2008 at 05:11:02PM -0400, Jeff Moyer wrote: > Nick Piggin <npiggin@xxxxxxx> writes: > > > Direct IO can invalidate and sync a lot of pagecache pages in the mapping. A > > 4K direct IO will actually try to sync and/or invalidate the pagecache of the > > entire file, for example (which might be many GB or TB large). > > > > Improve this by doing range syncs. Also, memory no longer has to be unmapped > > to catch the dirty bits for syncing, as dirty bits would remain coherent due to > > dirty mmap accounting. > > > > This should fix the immediate DM deadlocks when doing direct IO reads to > > block device with a mounted filesystem, if only by papering over the problem > > somewhat rather than addressing the fsync starvation cases. Not that the > > patch itself is a hack, but for this particular problem it is not really > > the correct solution IMO. But anyway, this might be more appropriate to go > > into stable kernels if this DM deadlock is biting users. > > > > Yes, I still need to put more time into finishing my pagecache tag based > > sync solution. Sorry :( > > > > > > --- > > Index: linux-2.6/mm/filemap.c > > =================================================================== > > --- linux-2.6.orig/mm/filemap.c 2008-10-03 11:21:31.000000000 +1000 > > +++ linux-2.6/mm/filemap.c 2008-10-03 12:00:17.000000000 +1000 > > @@ -1304,11 +1304,8 @@ generic_file_aio_read(struct kiocb *iocb > > goto out; /* skip atime */ > > size = i_size_read(inode); > > if (pos < size) { > > - retval = filemap_write_and_wait(mapping); > > - if (!retval) { > > - retval = mapping->a_ops->direct_IO(READ, iocb, > > + retval = mapping->a_ops->direct_IO(READ, iocb, > > iov, pos, nr_segs); > > - } > > So why is it safe to get rid of this? Can't this result in reading > stale data from disk? AFAIKS, __blockdev_direct_IO is doing the same thing for us, when it encounters a READ. I should have documented this change. This is one thing I'm not *quite* sure of there might be a path do the block device that I haven't considered, and which does not do the sync... > The rest looks good to me. I ran the aio-dio-regress tests against this > kernel on a UP machine, and they all passed. The kernel didn't boot on > my SMP box, though. Nick, any chance you could grab that test suite and > run it on an smp system? > http://git.kernel.org/?p=linux/kernel/git/zab/aio-dio-regress.git;a=summary Yeah I could give that a shot and repost the patch for Andrew in a day or two. Thanks for looking a it. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html