On Fri, May 17, 2024 at 03:38:06PM +0100, Matthew Wilcox wrote: > On Sat, May 18, 2024 at 04:14:07AM +0800, Xu Yang wrote: > > Since commit (5d8edfb900d5 "iomap: Copy larger chunks from userspace"), > > iomap will try to copy in larger chunks than PAGE_SIZE. However, if the > > mapping doesn't support large folio, only one page of maximum 4KB will > > be created and 4KB data will be writen to pagecache each time. Then, > > next 4KB will be handled in next iteration. > > > > If chunk is 2MB, total 512 pages need to be handled finally. During this > > period, fault_in_iov_iter_readable() is called to check iov_iter readable > > validity. Since only 4KB will be handled each time, below address space > > will be checked over and over again: > > > > start end > > - > > buf, buf+2MB > > buf+4KB, buf+2MB > > buf+8KB, buf+2MB > > ... > > buf+2044KB buf+2MB > > > > Obviously the checking size is wrong since only 4KB will be handled each > > time. So this will get a correct bytes before fault_in_iov_iter_readable() > > to let iomap work well in non-large folio case. > > You haven't talked at all about why this is important. Is it a > performance problem? If so, numbers please. Particularly if you want > this backported to stable. Yes, it's indeed a performance problem. I meet the issue on all my ARM64 devices. Simply running the 'dd' command with varying block sizes will result in different write speeds. I totally write 4GB data to sda for each test, the results as below: - dd if=/dev/zero of=/dev/sda bs=400K count=10485 (334 MB/s) - dd if=/dev/zero of=/dev/sda bs=800K count=5242 (278 MB/s) - dd if=/dev/zero of=/dev/sda bs=1600K count=2621 (204 MB/s) - dd if=/dev/zero of=/dev/sda bs=2200K count=1906 (170 MB/s) - dd if=/dev/zero of=/dev/sda bs=3000K count=1398 (150 MB/s) - dd if=/dev/zero of=/dev/sda bs=4500K count=932 (139 MB/s) > > I alos think this is the wrong way to solve the problem. We should > instead adjust 'chunk'. Given everything else going on, I think we > want: > > (in filemap.h): > > static inline size_t mapping_max_folio_size(struct address_space *mapping) > { > if (mapping_large_folio_support(mapping)) > return PAGE_SIZE << MAX_PAGECACHE_ORDER; > return PAGE_SIZE; > } > > and then in iomap, > > - size_t chunk = PAGE_SIZE << MAX_PAGECACHE_ORDER; > + size_t chunk = mapping_max_folio_size(mapping); > > (and move the initialisation of 'mapping' to before this) > Thanks for your suggestions. This should be a better way to improve the logic. With above changes, I can see stable and high write speed now: - dd if=/dev/zero of=/dev/sda bs=400K count=10485 (339 MB/s) - dd if=/dev/zero of=/dev/sda bs=800K count=5242 (330 MB/s) - dd if=/dev/zero of=/dev/sda bs=1600K count=2621 (332 MB/s) - dd if=/dev/zero of=/dev/sda bs=2200K count=1906 (333 MB/s) - dd if=/dev/zero of=/dev/sda bs=3000K count=1398 (333 MB/s) - dd if=/dev/zero of=/dev/sda bs=4500K count=932 (333 MB/s) I will send v3 later. Thanks, Xu Yang