While testing other things, I noticed that several iozone tests showed a significant regression in large direct WRITE performance with little to no drop in small WRITE IOPS. One example (NFS/RDMA on FDR InfiniBand): Machine = Linux manet.1015granger.net 5.7.0-rc7-00033-g8de6ca0614d4 #1071 SMP CPU utilization Resolution = 0.000 seconds. CPU utilization Excel chart enabled File size set to 1048576 kB Record Size 256 kB O_DIRECT feature enabled Command line used: /home/cel/bin/iozone -M -+u -i0 -i1 -s1g -r256k -t12 -I Output is in kBytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 kBytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Throughput test with 12 processes Each process writes a 1048576 kByte file in 256 kByte records Children see throughput for 12 initial writers = 2430898.66 kB/sec Parent sees throughput for 12 initial writers = 2425731.85 kB/sec Min throughput per process = 202025.03 kB/sec Max throughput per process = 202899.33 kB/sec Avg throughput per process = 202574.89 kB/sec Min xfer = 1044224.00 kB CPU Utilization: Wall time 5.179 CPU time 2.020 CPU utilization 39.00 % Children see throughput for 12 rewriters = 2431774.06 kB/sec Parent sees throughput for 12 rewriters = 2431230.83 kB/sec Min throughput per process = 202230.42 kB/sec Max throughput per process = 202926.08 kB/sec Avg throughput per process = 202647.84 kB/sec Min xfer = 1045248.00 kB CPU utilization: Wall time 5.169 CPU time 2.015 CPU utilization 38.99 % These numbers are half what they usually are. I bisected between v5.6 and v5.7-rc7, and it terminated on 1f28476dcb98 ("NFS: Fix O_DIRECT commit verifier handling"). This commit doesn't revert cleanly -- the kernel won't build after it is reverted, so I can't easily do the obvious test to confirm the bisect result. I intend to look into the exact pathology, but wanted to get this regression reported first, in case someone has a thought about what is slowing things down. -- Chuck Lever