On 23/01/30 05:54PM, Matthew Wilcox wrote: > On Mon, Jan 30, 2023 at 09:44:13PM +0530, Ritesh Harjani (IBM) wrote: > > On a 64k pagesize platforms (specially Power and/or aarch64) with 4k > > filesystem blocksize, this patch should improve the performance by doing > > only the subpage dirty data write. > > > > This should also reduce the write amplification since we can now track > > subpage dirty status within state bitmaps. Earlier we had to > > write the entire 64k page even if only a part of it (e.g. 4k) was > > updated. > > > > Performance testing of below fio workload reveals ~16x performance > > improvement on nvme with XFS (4k blocksize) on Power (64K pagesize) > > FIO reported write bw scores improved from around ~28 MBps to ~452 MBps. > > > > <test_randwrite.fio> > > [global] > > ioengine=psync > > rw=randwrite > > overwrite=1 > > pre_read=1 > > direct=0 > > bs=4k > > size=1G > > dir=./ > > numjobs=8 > > fdatasync=1 > > runtime=60 > > iodepth=64 > > group_reporting=1 > > > > [fio-run] > > You really need to include this sentence from the cover letter in this > patch: > > 2. Also our internal performance team reported that this patch improves there > database workload performance by around ~83% (with XFS on Power) > > because that's far more meaningful than "Look, I cooked up an artificial > workload where this makes a difference". Agreed. I will add the other lines too in the commit message. The intention behind adding fio workload is for others to have a test case to verify against and/or provide more info when someone later refers to the commit message. One of the interesting observation with this synthetic fio workload was we can /should easily observe the theoritical performance gain of around ~16x (i.e. 64k(ps) / 4k(bs)). Thanks again for the review! -ritesh