On Wed, Jun 8, 2022 at 6:48 AM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > Rebased to -rc1 and reordered. Sits in vfs.git #work.iov_iter, > individual patches in followups > > 1/9: No need of likely/unlikely on calls of check_copy_size() > not just in uio.h; the thing is inlined and it has unlikely on > all paths leading to return false > > 2/9: btrfs_direct_write(): cleaner way to handle generic_write_sync() suppression > new flag for iomap_dio_rw(), telling it to suppress generic_write_sync() > > 3/9: struct file: use anonymous union member for rcuhead and llist > "f_u" might have been an amusing name, but... we expect anon unions to > work. > > 4/9: iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC > makes iocb_flags() much cheaper, and it's easier to keep track of > the places where it can change. > > 5/9: keep iocb_flags() result cached in struct file > that, along with the previous commit, reduces the overhead of > new_sync_{read,write}(). struct file doesn't grow - we can keep that > thing in the same anon union where rcuhead and llist live; that field > gets used only before ->f_count reaches zero while the other two are > used only after ->f_count has reached zero. > > 6/9: copy_page_{to,from}_iter(): switch iovec variants to generic > kmap_local_page() allows that. And it kills quite a bit of > code. > > 7/9: new iov_iter flavour - ITER_UBUF > iovec analogue, with single segment. That case is fairly common and it > can be handled with less overhead than full-blown iovec. > > 8/9: switch new_sync_{read,write}() to ITER_UBUF > ... and this is why it is so common. Further reduction of overhead > for new_sync_{read,write}(). > > 9/9: iov_iter_bvec_advance(): don't bother with bvec_iter > AFAICS, variant similar to what we do for iovec/kvec generates better > code. Needs profiling, obviously. > I have pulled this on top of Linux v5.19-rc1... plus assorted patches to fix issues with LLVM/Clang version 14. No (new) warnings in my build-log. Boots fine on bare metal on my Debian/unstable AMD64 system. Any hints for testing - to see improvements? -Sedat- > Diffstat: > arch/powerpc/include/asm/uaccess.h | 2 +- > arch/s390/include/asm/uaccess.h | 4 +- > block/fops.c | 8 +- > drivers/nvme/target/io-cmd-file.c | 2 +- > fs/aio.c | 2 +- > fs/btrfs/file.c | 19 +-- > fs/btrfs/inode.c | 2 +- > fs/ceph/file.c | 2 +- > fs/cifs/file.c | 2 +- > fs/direct-io.c | 4 +- > fs/fcntl.c | 1 + > fs/file_table.c | 17 +- > fs/fuse/dev.c | 4 +- > fs/fuse/file.c | 4 +- > fs/gfs2/file.c | 2 +- > fs/io_uring.c | 2 +- > fs/iomap/direct-io.c | 24 +-- > fs/nfs/direct.c | 2 +- > fs/open.c | 1 + > fs/read_write.c | 6 +- > fs/zonefs/super.c | 2 +- > include/linux/fs.h | 21 ++- > include/linux/iomap.h | 2 + > include/linux/uaccess.h | 4 +- > include/linux/uio.h | 41 +++-- > lib/iov_iter.c | 308 +++++++++++-------------------------- > mm/shmem.c | 2 +- > 27 files changed, 191 insertions(+), 299 deletions(-)