Hello, On Wed, Mar 23, 2022 at 6:32 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > Which recent changes? Something in NFS or something in the VFS/MM? > Did you even think about asking a wider audience than the NFS mailing > list? I only happened to notice this while I was looking for something > else, otherwise I would never have seen it. The responses from other > people to your patches were right; you're trying to do this all wrong. > > Let's start out with a bug report instead of a solution. What changed > and when? > As Trond stated, c128e575514c ("NFS: Optimise the default readahead size") changed the way readahead is calculated for NFS mounts. This caused some read workloads to underperform, compared to the performance from previous revisions. To recall, the current policy is to adopt the system default readahead of 128kiB, and mounts with sec=krb5p take a performance hit of 50-75% when readahead is 128. I haven't performed an exhaustive search for other workloads that might also be affected, but I noticed the meaningful drop in performance in sec=sys mounts, notes at the end. The previous policy was to calculate the readahead as a multiple of rsize, so we prescribed increasing the value to the complaining part, and this fixed the issue. We are now trying to find a solution that we can incorporate into the system. thiago. ----- Tests ===== RAWHIDE (35% performance hit) ===== # uname -r 5.16.0-0.rc0.20211112git5833291ab6de.12.fc36.x86_64 # grep nfs /proc/self/mountinfo 601 60 0:55 / /mnt rw,relatime shared:332 - nfs4 192.168.122.225:/exports rw,vers=4.2,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.83,local_lock=none,addr=192.168.122.225 # cat /sys/class/bdi/0\:55/read_ahead_kb 128 # for i in {0..3} ; do dd if=/mnt/testfile.bin of=/dev/null bs=1M 2>&1 | grep copied ; echo 3 > /proc/sys/vm/drop_caches ; done 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 16.5025 s, 260 MB/s 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 16.4474 s, 261 MB/s 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 18.0181 s, 238 MB/s 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 18.2323 s, 236 MB/s # echo 15360 > /sys/class/bdi/0\:55/read_ahead_kb # for i in {0..3} ; do dd if=/mnt/testfile.bin of=/dev/null bs=1M 2>&1 | grep copied ; echo 3 > /proc/sys/vm/drop_caches ; done 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 11.2601 s, 381 MB/s 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 11.1885 s, 384 MB/s 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 11.5877 s, 371 MB/s 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 10.9475 s, 392 MB/s ===== UPSTREAM (30% performance hit) ===== # uname -r 5.17.0+ # grep nfs /proc/self/mountinfo 583 60 0:55 / /mnt rw,relatime shared:302 - nfs4 192.168.122.225:/exports rw,vers=4.2,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.83,local_lock=none,addr=192.168.122.225 # cat /sys/class/bdi/0\:55/read_ahead_kb 128 # for i in {0..3} ; do dd if=/mnt/testfile.bin of=/dev/null bs=1M 2>&1 | grep copied ; echo 3 > /proc/sys/vm/drop_caches ; done 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 17.056 s, 252 MB/s 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 17.1258 s, 251 MB/s 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 16.5981 s, 259 MB/s 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 16.5487 s, 260 MB/s # echo 15360 > /sys/class/bdi/0\:55/read_ahead_kb # for i in {0..3} ; do dd if=/mnt/testfile.bin of=/dev/null bs=1M 2>&1 | grep copied ; echo 3 > /proc/sys/vm/drop_caches ; done 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 12.3855 s, 347 MB/s 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 11.2528 s, 382 MB/s 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 11.9849 s, 358 MB/s 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 11.2953 s, 380 MB/s