Hi Matthew, We have also discovered the expense of `->read_iter` in our study on Ext4-DAX. In single-thread 4K-reads, the `->read` version could outperform `->read_iter` by 41.6% in terms of throughput. According to our observation and evaluation, at least for Ext4-DAX, the cost also comes from the invocation of `->iomap_begin` (`ext4_iomap_begin`), which might not be simply avoided by adding a new iter_type. The slowdown is more significant when multiple threads reading different files concurrently, due to the scalability issue (grabbing a read lock to check the status of the journal) in `ext4_iomap_begin`. In our solution, we implemented the `->read` and `->write` interfaces for Ext4-DAX. Thus, we also think it would be good if both `->read` and `->read_iter` could exist. By the way, besides the implementation of `->read` and `->write`, we have some other optimizations for Ext4-DAX and would like to share them once our patches are prepared. Thanks, Mingkai > On Jan 7, 2021, at 23:11, Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Thu, Jan 07, 2021 at 08:15:41AM -0500, Mikulas Patocka wrote: >> I'd like to ask about this piece of code in __kernel_read: >> if (unlikely(!file->f_op->read_iter || file->f_op->read)) >> return warn_unsupported... >> and __kernel_write: >> if (unlikely(!file->f_op->write_iter || file->f_op->write)) >> return warn_unsupported... >> >> - It exits with an error if both read_iter and read or write_iter and >> write are present. >> >> I found out that on NVFS, reading a file with the read method has 10% >> better performance than the read_iter method. The benchmark just reads the >> same 4k page over and over again - and the cost of creating and parsing >> the kiocb and iov_iter structures is just that high. > > Which part of it is so expensive? Is it worth, eg adding an iov_iter > type that points to a single buffer instead of a single-member iov? > > +++ b/include/linux/uio.h > @@ -19,6 +19,7 @@ struct kvec { > > enum iter_type { > /* iter types */ > + ITER_UBUF = 2, > ITER_IOVEC = 4, > ITER_KVEC = 8, > ITER_BVEC = 16, > @@ -36,6 +36,7 @@ struct iov_iter { > size_t iov_offset; > size_t count; > union { > + void __user *buf; > const struct iovec *iov; > const struct kvec *kvec; > const struct bio_vec *bvec; > > and then doing all the appropriate changes to make that work. > _______________________________________________ > Linux-nvdimm mailing list -- linux-nvdimm@xxxxxxxxxxxx > To unsubscribe send an email to linux-nvdimm-leave@xxxxxxxxxxxx