On Thu 22-02-24 12:50:32, Jan Kara wrote: > On Thu 22-02-24 09:32:52, Oliver Sang wrote: > > On Wed, Feb 21, 2024 at 12:14:25PM +0100, Jan Kara wrote: > > > On Tue 20-02-24 16:25:37, kernel test robot wrote: > > > > kernel test robot noticed a -21.4% regression of vm-scalability.throughput on: > > > > > > > > commit: ab4443fe3ca6298663a55c4a70efc6c3ce913ca6 ("readahead: avoid multiple marked readahead pages") > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > > > > > testcase: vm-scalability > > > > test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory > > > > parameters: > > > > > > > > runtime: 300s > > > > test: lru-file-readtwice > > > > cpufreq_governor: performance > > > > > > JFYI I had a look into this. What the test seems to do is that it creates > > > image files on tmpfs, loopmounts XFS there, and does reads over file on > > > XFS. But I was not able to find what lru-file-readtwice exactly does, > > > neither I was able to reproduce it because I got stuck on some missing Ruby > > > dependencies on my test system yesterday. > > > > what's your OS? > > I have SLES15-SP4 installed in my VM. What was missing was 'git' rubygem > which apparently is not packaged at all and when I manually installed it, I > was still hitting other problems so I rather went ahead and checked the > vm-scalability source and wrote my own reproducer based on that. > > I'm now able to reproduce the regression in my VM so I'm investigating... So I was experimenting with this. What the test does is it creates as many files as there are CPUs, files are sized so that their total size is 8x the amount of available RAM. For each file two tasks are started which sequentially read the file from start to end. Trivial repro from my VM with 8 CPUs and 64GB of RAM is like: truncate -s 60000000000 /dev/shm/xfsimg mkfs.xfs /dev/shm/xfsimg mount -t xfs -o loop /dev/shm/xfsimg /mnt for (( i = 0; i < 8; i++ )); do truncate -s 60000000000 /mnt/sparse-file-$i; done echo "Ready..." sleep 3 echo "Running..." for (( i = 0; i < 8; i++ )); do dd bs=4k if=/mnt/sparse-file-$i of=/dev/null & dd bs=4k if=/mnt/sparse-file-$i of=/dev/null & done 2>&1 | grep "copied" wait umount /mnt The difference between slow and fast runs seems to be in the amount of pages reclaimed with direct reclaim - after commit ab4443fe3c we reclaim about 10% of pages with direct reclaim, before commit ab4443fe3c only about 1% of pages is reclaimed with direct reclaim. In both cases we reclaim the same amount of pages corresponding to the total size of files so it isn't the case that we would be rereading one page twice. I suspect the reclaim difference is because after commit ab4443fe3c we trigger readahead somewhat earlier so our effective workingset is somewhat larger. This apparently gives harder time to kswapd and we end up with direct reclaim more often. Since this is a case of heavy overload on the system, I don't think the throughput here matters that much and AFAICT the readahead code does nothing wrong here. So I don't think we need to do anything here. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR