Hi, > On Wed, May 10, 2023 at 01:46:49PM +0800, Wang Yugui wrote: > > > Ok, that is further back in time than I expected. In terms of XFS, > > > there are only two commits between 5.16..5.17 that might impact > > > performance: > > > > > > ebb7fb1557b1 ("xfs, iomap: limit individual ioend chain lengths in writeback") > > > > > > and > > > > > > 6795801366da ("xfs: Support large folios") > > > > > > To test whether ebb7fb1557b1 is the cause, go to > > > fs/iomap/buffered-io.c and change: > > > > > > -#define IOEND_BATCH_SIZE 4096 > > > +#define IOEND_BATCH_SIZE 1048576 > > > This will increase the IO submission chain lengths to at least 4GB > > > from the 16MB bound that was placed on 5.17 and newer kernels. > > > > > > To test whether 6795801366da is the cause, go to fs/xfs/xfs_icache.c > > > and comment out both calls to mapping_set_large_folios(). This will > > > ensure the page cache only instantiates single page folios the same > > > as 5.16 would have. > > > > 6.1.x with 'mapping_set_large_folios remove' and 'IOEND_BATCH_SIZE=1048576' > > fio WRITE: bw=6451MiB/s (6764MB/s) > > > > still performance regression when compare to linux 5.16.20 > > fio WRITE: bw=7666MiB/s (8039MB/s), > > > > but the performance regression is not too big, then difficult to bisect. > > We noticed samle level performance regression on btrfs too. > > so maby some problem of some code that is used by both btrfs and xfs > > such as iomap and mm/folio. > > Yup, that's quite possibly something like the multi-gen LRU changes, > but that's not the regression we need to find. :/ > > > 6.1.x with 'mapping_set_large_folios remove' only' > > fio WRITE: bw=2676MiB/s (2806MB/s) > > > > 6.1.x with 'IOEND_BATCH_SIZE=1048576' only' > > fio WRITE: bw=5092MiB/s (5339MB/s), > > fio WRITE: bw=6076MiB/s (6371MB/s) > > > > maybe we need more fix or ' ebb7fb1557b1 ("xfs, iomap: limit > > individual ioend chain lengths in writeback")'. > > OK, can you re-run the two 6.1.x kernels above (the slow and the > fast) and record the output of `iostat -dxm 1` whilst the > fio test is running? I want to see what the overall differences in > the IO load on the devices are between the two runs. This will tell > us how the IO sizes and queue depths change between the two kernels, > etc. `iostat -dxm 1` result saved in attachment file. good.txt good performance bad.txt bad performance Best Regards Wang Yugui (wangyugui@xxxxxxxxxxxx) 2023/05/10 > > Right now I'm suspecting a contention interaction between write(), > do_writepages() and folio_end_writeback()... > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx
Attachment:
good.txt
Description: Binary data
Attachment:
bad.txt
Description: Binary data