On Tue, May 10, 2011 at 02:41:51PM +0900, Utako Kusaka wrote: > Hi, > > When I tested concurrent mmap write and direct IO to the same file, > it was corrupted. Kernel version is 2.6.39-rc4. Long time problem of the mmap_sem being held while .page_mkwrite is called, which means we can't use the i_mutex or xfs inode iolock for serialisation against reads and writes because the mmap_sem can be taken on page faults during read or write. Hence we've got the choice of deadlocks or no serialisation between direct Io and mmap... > I have two questions concerning xfs direct IO. > > The first is dirty pages are released in direct read. xfs direct IO uses > xfs_flushinval_pages(), which writes out and releases dirty pages. Yup - once you bypass the page cache, it is stale and needs to be removed from memory so it can be reread from disk when the next buffered IO occurs. > If pages are marked as dirty after filemap_write_and_wait_range(), > they will be released in truncate_inode_pages_range() without writing out. If .page_mkwrite could take either the iolock or the i_mutex, it would be protected against this like all other operations are. > > sys_read() > vfs_read() > do_sync_read() > xfs_file_aio_read() > xfs_flushinval_pages() > filemap_write_and_wait_range() > truncate_inode_pages_range() <--- > generic_file_aio_read() > filemap_write_and_wait_range() > xfs_vm_direct_IO() > > ext3 calls generic_file_aio_read() only and does not call > truncate_inode_pages_range(). > > sys_read() > vfs_read() > do_sync_read() > generic_file_aio_read() > filemap_write_and_wait_range() > ext3_direct_IO() ext3 is vastly different w.r.t. direct IO functionality, and so can't be directly compared against XFS behaviour. > xfs_file_aio_read() and xfs_file_dio_aio_write() call generic function. And > both xfs functions and generic functions call filemap_write_and_wait_range(). > So I wonder whether xfs_flushinval_pages() is necessary. The data corruption it fixed long ago woul dprobably return in some form... > Then, the write range in xfs_flushinval_pages() called from direct IO is > from start pos to -1, or LLONG_MAX, and is not IO range. Is there any reason? > In generic_file_aio_read and generic_file_direct_write(), it is from start pos > to (pos + len - 1). > I think xfs_flushinval_pages() should be called with same range. Probably should be, but it will need significant testing to ensure that it doesn't intorduce a new coherency/corruption corner case... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs