On Wed, Nov 28, 2012 at 12:03 PM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > mmap() is in *no* way special. The exact same thing happens for > regular read/write. Yet somehow the mmap code is special-cased, while > the normal read-write code is not. I just double-checked, because it's been a long time since I actually looked at the code. But yeah, block device read/write uses the pure page cache functions. IOW, it has the *exact* same IO engine as mmap() would have. So here's my suggestion: - get rid of *all* the locking in aio_read/write and the splice paths - get rid of all the stupid mmap games - instead, add them to the functions that actually use "blkdev_get_block()" and "blkdev_get_blocks()" and nowhere else. That's a fairly limited number of functions: blkdev_{read,write}page(), blkdev_direct_IO() and blkdev_write_{begin,end}() Doesn't that sounds simpler? And more logical: it protects the actual places that use the block size of the device. I dunno. Maybe there is some fundamental reason why the above is broken, but it seems to be a much simpler approach. Sure, you need to guarantee that the people who get the write-lock cannot possibly cause IO while holding it, but since the only reason to get the write lock would be to change the block size, that should be pretty simple, no? Yeah, yeah, I'm probably missing something fundamental, but the above sounds like the simple approach to fixing things. Aiming for having the block size read-lock be taken by the things that pass in the block-size itself. It would be nice for things to be logical and straightforward. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html