Direct IO can invalidate and sync a lot of pagecache pages in the mapping. A 4K direct IO will actually try to sync and/or invalidate the pagecache of the entire file, for example (which might be many GB or TB large). Improve this by doing range syncs. Also, memory no longer has to be unmapped to catch the dirty bits for syncing, as dirty bits would remain coherent due to dirty mmap accounting. This should fix the immediate DM deadlocks when doing direct IO reads to block device with a mounted filesystem, if only by papering over the problem somewhat rather than addressing the fsync starvation cases. Not that the patch itself is a hack, but for this particular problem it is not really the correct solution IMO. But anyway, this might be more appropriate to go into stable kernels if this DM deadlock is biting users. Yes, I still need to put more time into finishing my pagecache tag based sync solution. Sorry :( --- Index: linux-2.6/mm/filemap.c =================================================================== --- linux-2.6.orig/mm/filemap.c 2008-10-03 11:21:31.000000000 +1000 +++ linux-2.6/mm/filemap.c 2008-10-03 12:00:17.000000000 +1000 @@ -1304,11 +1304,8 @@ generic_file_aio_read(struct kiocb *iocb goto out; /* skip atime */ size = i_size_read(inode); if (pos < size) { - retval = filemap_write_and_wait(mapping); - if (!retval) { - retval = mapping->a_ops->direct_IO(READ, iocb, + retval = mapping->a_ops->direct_IO(READ, iocb, iov, pos, nr_segs); - } if (retval > 0) *ppos = pos + retval; if (retval) { @@ -2110,18 +2107,10 @@ generic_file_direct_write(struct kiocb * if (count != ocount) *nr_segs = iov_shorten((struct iovec *)iov, *nr_segs, count); - /* - * Unmap all mmappings of the file up-front. - * - * This will cause any pte dirty bits to be propagated into the - * pageframes for the subsequent filemap_write_and_wait(). - */ write_len = iov_length(iov, *nr_segs); end = (pos + write_len - 1) >> PAGE_CACHE_SHIFT; - if (mapping_mapped(mapping)) - unmap_mapping_range(mapping, pos, write_len, 0); - written = filemap_write_and_wait(mapping); + written = filemap_write_and_wait_range(mapping, pos, pos + write_len - 1); if (written) goto out; @@ -2507,7 +2496,8 @@ generic_file_buffered_write(struct kiocb * the file data here, to try to honour O_DIRECT expectations. */ if (unlikely(file->f_flags & O_DIRECT) && written) - status = filemap_write_and_wait(mapping); + status = filemap_write_and_wait_range(mapping, + pos, pos + written - 1); return written ? written : status; } -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html