Re: [PATCH v2] iomap: avoid redundant fault_in_iov_iter_readable() judgement when use larger chunks

Xu Yang <xu.yang_2@xxxxxxx> · Mon, 20 May 2024 10:28:52 +0800

On Fri, May 17, 2024 at 03:38:06PM +0100, Matthew Wilcox wrote:
> On Sat, May 18, 2024 at 04:14:07AM +0800, Xu Yang wrote:
> > Since commit (5d8edfb900d5 "iomap: Copy larger chunks from userspace"),
> > iomap will try to copy in larger chunks than PAGE_SIZE. However, if the
> > mapping doesn't support large folio, only one page of maximum 4KB will
> > be created and 4KB data will be writen to pagecache each time. Then,
> > next 4KB will be handled in next iteration.
> > 
> > If chunk is 2MB, total 512 pages need to be handled finally. During this
> > period, fault_in_iov_iter_readable() is called to check iov_iter readable
> > validity. Since only 4KB will be handled each time, below address space
> > will be checked over and over again:
> > 
> > start         	end
> > -
> > buf,    	buf+2MB
> > buf+4KB, 	buf+2MB
> > buf+8KB, 	buf+2MB
> > ...
> > buf+2044KB 	buf+2MB
> > 
> > Obviously the checking size is wrong since only 4KB will be handled each
> > time. So this will get a correct bytes before fault_in_iov_iter_readable()
> > to let iomap work well in non-large folio case.
> 
> You haven't talked at all about why this is important.  Is it a
> performance problem?  If so, numbers please.  Particularly if you want
> this backported to stable.

Yes, it's indeed a performance problem. I meet the issue on all my ARM64
devices. Simply running the 'dd' command with varying block sizes will
result in different write speeds.

I totally write 4GB data to sda for each test, the results as below:

 - dd if=/dev/zero of=/dev/sda bs=400K  count=10485  (334 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=800K  count=5242   (278 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=1600K count=2621   (204 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=2200K count=1906   (170 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=3000K count=1398   (150 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=4500K count=932    (139 MB/s)

> 
> I alos think this is the wrong way to solve the problem.  We should
> instead adjust 'chunk'.  Given everything else going on, I think we
> want:
> 
> (in filemap.h):
> 
> static inline size_t mapping_max_folio_size(struct address_space *mapping)
> {
> 	if (mapping_large_folio_support(mapping))
> 		return PAGE_SIZE << MAX_PAGECACHE_ORDER;
> 	return PAGE_SIZE;
> }
> 
> and then in iomap,
> 
> -	size_t chunk = PAGE_SIZE << MAX_PAGECACHE_ORDER;
> +	size_t chunk = mapping_max_folio_size(mapping);
> 
> (and move the initialisation of 'mapping' to before this)
> 

Thanks for your suggestions. This should be a better way to improve the
logic. With above changes, I can see stable and high write speed now:

 - dd if=/dev/zero of=/dev/sda bs=400K  count=10485  (339 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=800K  count=5242   (330 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=1600K count=2621   (332 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=2200K count=1906   (333 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=3000K count=1398   (333 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=4500K count=932    (333 MB/s)

I will send v3 later.

Thanks,
Xu Yang