On Fri, Oct 28, 2022 at 10:01:19AM -0700, Darrick J. Wong wrote: > On Fri, Oct 28, 2022 at 10:00:33AM +0530, Ritesh Harjani (IBM) wrote: > > Performance testing of below fio workload reveals ~16x performance > > improvement on nvme with XFS (4k blocksize) on Power (64K pagesize) > > FIO reported write bw scores improved from around ~28 MBps to ~452 MBps. > > > > <test_randwrite.fio> > > [global] > > ioengine=psync > > rw=randwrite > > overwrite=1 > > pre_read=1 > > direct=0 > > bs=4k > > size=1G > > dir=./ > > numjobs=8 > > fdatasync=1 > > runtime=60 > > iodepth=64 > > group_reporting=1 > > Admittedly I'm not thrilled at the reintroduction of page and iop dirty > state that are updated in separate places, but OTOH the write > amplification here is demonstrably horrifying as you point out so it's > clearly necessary. Well, *something* is necessary. I worked on a different approach that would have similar effects for this exact workload, which was to submit the I/O for O_SYNC while we still know which part of the page we dirtied. Previous discussion: https://lore.kernel.org/all/YQlgjh2R8OzJkFoB@xxxxxxxxxxxxxxxxxxxx/ Actual patches: https://lore.kernel.org/all/20220503064008.3682332-1-willy@xxxxxxxxxxxxx/