Hi Dave, sorry for taking forever to get back to this - travel to LSF and some other meetings and a dealine last week didn't leave me any time for XFS work. On Thu, Apr 14, 2016 at 07:54:42AM +1000, Dave Chinner wrote: > Christoph, have you done any perf testing of this patchset yet to > check that it does indeed reduce the CPU overhead of large write > operations? I'd also be interested to know if there is any change in > overhead for single page (4k) IOs as well, even though I suspect > there won't be. I've done a lot of testing earlier, and this version also looks very promising. On the sort of hardware I have access to now, the 4k numbers don't change much, but with 1M writes we both increase the write bandwith a little bit and significantly lower the cpu usage. The simple test that demonstrates this is this, the runs are from a 4p VM with 4G of RAM, access to a fast NVMe SSD and a small enough data size so that writeback shouldn't throttle the buffered write path: MNT=/mnt PERF="perf_3.16" # soo smart to have tools in the kernel tree.. #BS=4k #COUNT=65536 BS=1M COUNT=256 $PERF stat dd if=/dev/zero of=$MNT/testfile bs=$BS count=$COUNT with the baseline for-next tree I get the following bandwith and cpu utilization: BS=4k: ~600MB/s 0.856 CPUs utilized ( +- 0.32% ) BS=1M: 1.45GB/s 0.820 CPUs utilized ( +- 0.77% ) with all patches applied: BS=4k: ~610MB/s 0.848 CPUs utilized ( +- 0.36% ) BS=1M: ~1.55GB/s 0.615 CPUs utilized ( +- 0.80% ) This is also visible in the walltime baseline, 4k: real 0m0.540s user 0m0.000s sys 0m0.533s baseline, 1M: real 0m0.310s user 0m0.000s sys 0m0.313s multipage, 4k: real 0m0.541s user 0m0.010s sys 0m0.527s multipage, 1M: real 0m0.272s user 0m0.000s sys 0m0.263s _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs