On Tue, May 09, 2023 at 08:37:52PM +0800, Wang Yugui wrote: > > On Tue, May 09, 2023 at 07:25:53AM +0800, Wang Yugui wrote: > > > > On Mon, May 08, 2023 at 10:46:12PM +0800, Wang Yugui wrote: > > > > > > Hi, > > > > > > > > > > > > I noticed a performance regression of xfs 6.1.27/6.1.23, > > > > > > with the compare to xfs 5.15.110. > > > > > > > > > > > > It is yet not clear whether it is a problem of xfs or lvm2. > > > > > > > > > > > > any guide to troubleshoot it? > > > > > > > > > > > > test case: > > > > > > disk: NVMe PCIe3 SSD *4 > > > > > > LVM: raid0 default strip size 64K. > > > > > > fio -name write-bandwidth -rw=write -bs=1024Ki -size=32Gi -runtime=30 > > > > > > -iodepth 1 -ioengine sync -zero_buffers=1 -direct=0 -end_fsync=1 -numjobs=4 > > > > > > -directory=/mnt/test ..... > > > > Because you are testing buffered IO, you need to run perf across all > > > > CPUs and tasks, not just the fio process so that it captures the > > > > profile of memory reclaim and writeback that is being performed by > > > > the kernel. > > > > > > 'perf report' of all CPU. > > > Samples: 211K of event 'cycles', Event count (approx.): 56590727219 > > > Overhead Command Shared Object Symbol > > > 16.29% fio [kernel.kallsyms] [k] rep_movs_alternative > > > 3.38% kworker/u98:1+f [kernel.kallsyms] [k] native_queued_spin_lock_slowpath > > > 3.11% fio [kernel.kallsyms] [k] native_queued_spin_lock_slowpath > > > 3.05% swapper [kernel.kallsyms] [k] intel_idle > > > 2.63% fio [kernel.kallsyms] [k] get_page_from_freelist > > > 2.33% fio [kernel.kallsyms] [k] asm_exc_nmi > > > 2.26% kworker/u98:1+f [kernel.kallsyms] [k] __folio_start_writeback > > > 1.40% fio [kernel.kallsyms] [k] __filemap_add_folio > > > 1.37% fio [kernel.kallsyms] [k] lru_add_fn > > > 1.35% fio [kernel.kallsyms] [k] xas_load > > > 1.33% fio [kernel.kallsyms] [k] iomap_write_begin > > > 1.31% fio [kernel.kallsyms] [k] xas_descend > > > 1.19% kworker/u98:1+f [kernel.kallsyms] [k] folio_clear_dirty_for_io > > > 1.07% fio [kernel.kallsyms] [k] folio_add_lru > > > 1.01% fio [kernel.kallsyms] [k] __folio_mark_dirty > > > 1.00% kworker/u98:1+f [kernel.kallsyms] [k] _raw_spin_lock_irqsave > > > > > > and 'top' show that 'kworker/u98:1' have over 80% CPU usage. > > > > Can you provide an expanded callgraph profile for both the good and > > bad kernels showing the CPU used in the fio write() path and the > > kworker-based writeback path? > > I'm sorry that some detail guide for info gather of this test please. 'perf record -g' and 'perf report -g' should enable callgraph profiling and reporting. See the perf-record man page for '--callgraph' to make sure you have the right kernel config for this to work efficiently. You can do quick snapshots in time via 'perf top -U -g' and then after a few seconds type 'E' then immediately type 'P' and the fully expanded callgraph profile will get written to a perf.hist.N file in the current working directory... > > > I tested 6.4.0-rc1. the performance become a little worse. > > > > Thanks, that's as I expected. > > > > WHich means that the interesting kernel versions to check now are a > > 6.0.x kernel, and then if it has the same perf as 5.15.x, then the > > commit before the multi-gen LRU was introduced vs the commit after > > the multi-gen LRU was introduced to see if that is the functionality > > that introduced the regression.... > > more performance test result: > > linux 6.0.18 > fio WRITE: bw=2565MiB/s (2689MB/s) > linux 5.17.0 > fio WRITE: bw=2602MiB/s (2729MB/s) > linux 5.16.20 > fio WRITE: bw=7666MiB/s (8039MB/s), > > so it is a problem between 5.16.20 and 5.17.0? Ok, that is further back in time than I expected. In terms of XFS, there are only two commits between 5.16..5.17 that might impact performance: ebb7fb1557b1 ("xfs, iomap: limit individual ioend chain lengths in writeback") and 6795801366da ("xfs: Support large folios") To test whether ebb7fb1557b1 is the cause, go to fs/iomap/buffered-io.c and change: -#define IOEND_BATCH_SIZE 4096 +#define IOEND_BATCH_SIZE 1048576 This will increase the IO submission chain lengths to at least 4GB from the 16MB bound that was placed on 5.17 and newer kernels. To test whether 6795801366da is the cause, go to fs/xfs/xfs_icache.c and comment out both calls to mapping_set_large_folios(). This will ensure the page cache only instantiates single page folios the same as 5.16 would have. If neither of them change behaviour, then I think you're going to need to do a bisect between 5.16..5.17 to find the commit that introduced the regression. I know kernel bisects are slow and painful, but it's exactly what I'd be doing right now if my performance test machine wasn't broken.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx