From: Zhongwei Cai > Sent: 12 January 2021 13:45 .. > The overhead mainly consists of two parts. The first is constructing > struct iov_iter and iterating it (i.e., new_sync, _copy_mc_to_iter and > iov_iter_init). The second is the dax io mechanism provided by VFS (i.e., > dax_iomap_rw, iomap_apply and ext4_iomap_begin). Setting up an iov_iter with a single buffer ought to be relatively cheap - compared to a file system read. The iteration should be over the total length calling copy_from/to_iter() for 'chunks' that don't depend on the size of the iov[] fragments. So copy_to/from_iter() should directly replace the copy_to/from_user() calls in the 'read' method. For a single buffer this really ought to be noise as well. Clearly is the iov[] has a lot of short fragments the copy will be more expensive. Access to /dev/null and /dev/zero are much more likely to show the additional costs of the iov_iter code than fs code. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)