On Fri, 26 Oct 2018 at 23:30, Elliott, Robert (Persistent Memory) <elliott@xxxxxxx> wrote: > > > > I run FIO with the following parameters and used the different engines I listed above: > > --randrepeat=1 > > --ioengine=posixaio > > --direct=1 > > --sync=1 > > --name=mytest > > --filename=mytestfile.fio > > --overwrite=1 > > --iodepth=64 > > --size=100MB > > --readwrite=randrw > > --rwmixread=50 > > --rwmixwrite=50 > > --bs=16k > > > ... > > -ioengine=psync on a 1000 IOPS Perf storage::: read=177 write=185 IOPS > > -ioengine=pvsync on a 1000 IOPS Perf storage::: read=176 write=184 IOPS > > -ioengine=sync on a 1000 IOPS Perf storage::: read=145 write=152 IOPS > > -ioengine=posixaio on a 1000 IOPS Perf storage::: read=528 write=551 IOPS* > > > > Based on the FIO results I posted above, I've concluded that the suitable FIO engine for Block Storage > > is "posixaio"; and for NFS it can be psync or pvsync or sync. > > If I use the FIO engine "posixaio" on VMware to test block storage, I'm seeing the expected IOPS. > > However, if I use the same FIO engine ("posixaio") to test NFS, I'm NOT seeing the expected IOPS. I'd > > have to use either psync, or pvsync, or sync to see the IOPS I'm expecting. > > These tests are on the same baremetal server running the version of VMWare I posted above. > > > > Can someone please shed some light on why FIO results is skewed when "posixaio" engine is used to test > > an NFS storage? The same goes with ioengines psync/pvsync/sync skews IOPS results on a block > > storage device? > > NFS might introduce all sorts of problems. For the block device cases... > > Synchronous engines like sync do not honor the iodepth; you need to use > jobs= to get multiple concurrent IOs. posixaio is an asynchronous engine, > so it is honoring your iodepth (it spawns threads for you). > > Also, if your storage devices do any sort of buffering and caching, the > tiny 100 MiB size is likely to result in lots of cache hits, distorting > the results. On the NFS front, perhaps you requesting direct=1 the kernel is actually using synchronous buffered I/O behind the scenes and all those syncs are slowing it down? Does using sync=0 show less of a difference between engines? -- Sitsofe | http://sucs.org/~sits/