On 12/12/19 12:01 PM, Jens Axboe wrote: > Recently someone asked me how io_uring buffered IO compares to mmaped > IO in terms of performance. So I ran some tests with buffered IO, and > found the experience to be somewhat painful. The test case is pretty > basic, random reads over a dataset that's 10x the size of RAM. > Performance starts out fine, and then the page cache fills up and we > hit a throughput cliff. CPU usage of the IO threads go up, and we have > kswapd spending 100% of a core trying to keep up. Seeing that, I was > reminded of the many complaints I here about buffered IO, and the fact > that most of the folks complaining will ultimately bite the bullet and > move to O_DIRECT to just get the kernel out of the way. > > But I don't think it needs to be like that. Switching to O_DIRECT isn't > always easily doable. The buffers have different life times, size and > alignment constraints, etc. On top of that, mixing buffered and O_DIRECT > can be painful. > > Seems to me that we have an opportunity to provide something that sits > somewhere in between buffered and O_DIRECT, and this is where > RWF_UNCACHED enters the picture. If this flag is set on IO, we get the > following behavior: > > - If the data is in cache, it remains in cache and the copy (in or out) > is served to/from that. This is true for both reads and writes. > > - For writes, if the data is NOT in cache, we add it while performing the > IO. When the IO is done, we remove it again. > > - For reads, if the data is NOT in the cache, we allocate a private page > and use that for IO. When the IO is done, we free this page. The page > never sees the page cache. > > With this, I can do 100% smooth buffered reads or writes without pushing > the kernel to the state where kswapd is sweating bullets. In fact it > doesn't even register. > > Comments appreciated! This should work on any standard file system, > using either the generic helpers or iomap. I have tested ext4 and xfs > for the right read/write behavior, but no further validation has been > done yet. This version contains the bigger prep patch of switching > iomap_apply() and actors to struct iomap_data, and I hope I didn't > mess that up too badly. I'll try and exercise it all, I've done XFS > mounts and reads+writes and it seems happy from that POV at least. > > The core of the changes are actually really small, the majority of > the diff is just prep work to get there. > > Patches are against current git, and can also be found here: > > https://git.kernel.dk/cgit/linux-block/log/?h=buffered-uncached > > fs/ceph/file.c | 2 +- > fs/dax.c | 25 +++-- > fs/ext4/file.c | 2 +- > fs/iomap/apply.c | 50 ++++++--- > fs/iomap/buffered-io.c | 225 +++++++++++++++++++++++++--------------- > fs/iomap/direct-io.c | 57 +++++----- > fs/iomap/fiemap.c | 48 +++++---- > fs/iomap/seek.c | 64 +++++++----- > fs/iomap/swapfile.c | 27 ++--- > fs/nfs/file.c | 2 +- > include/linux/fs.h | 7 +- > include/linux/iomap.h | 20 +++- > include/uapi/linux/fs.h | 5 +- > mm/filemap.c | 89 +++++++++++++--- > 14 files changed, 416 insertions(+), 207 deletions(-) > > Changes since v3: > - Add iomap_actor_data to cut down on arguments > - Fix bad flag drop in iomap_write_begin() > - Remove unused IOMAP_WRITE_F_UNCACHED flag > - Don't use the page cache at all for reads Had the silly lru error in v4, and also an XFS flags error. I'm not going to re-post already due to this, but please use: https://git.kernel.dk/cgit/linux-block/log/?h=buffered-uncached if you're going to test this. You can pull it at: git://git.kernel.dk/linux-block buffered-uncached Those are the only two changes since v4. I'll throw a v5 out there a bit later. -- Jens Axboe