Brian Foster <bfoster@xxxxxxxxxx> writes: > We've had reports of significant performance regression of sub-block > (unaligned) direct writes due to the added exclusivity restrictions > in ext4. The purpose of the exclusivity requirement for unaligned > direct writes is to avoid data corruption caused by unserialized > partial block zeroing in the iomap dio layer across overlapping > writes. > > XFS has similar requirements for the same underlying reasons, yet > doesn't suffer the extreme performance regression that ext4 does. > The reason for this is that XFS utilizes IOMAP_DIO_OVERWRITE_ONLY > mode, which allows for optimistic submission of concurrent unaligned > I/O and kicks back writes that require partial block zeroing such > that they can be submitted in a safe, exclusive context. Since ext4 > already performs most of these checks pre-submission, it can support > something similar without necessarily relying on the iomap flag and > associated retry mechanism. > > Update the dio write submission path to allow concurrent submission > of unaligned direct writes that are purely overwrite and so will not > require block zeroing. To improve readability of the various related > checks, move the unaligned I/O handling down into > ext4_dio_write_checks(), where the dio draining and force wait logic > can immediately follow the locking requirement checks. Finally, the > IOMAP_DIO_OVERWRITE_ONLY flag is set to enable a warning check as a > precaution should the ext4 overwrite logic ever become inconsistent > with the zeroing expectations of iomap dio. > > The performance improvement of sub-block direct write I/O is shown > in the following fio test on a 64xcpu guest vm: > > Test: fio --name=test --ioengine=libaio --direct=1 --group_reporting > --overwrite=1 --thread --size=10G --filename=/mnt/fio > --readwrite=write --ramp_time=10s --runtime=60s --numjobs=8 > --blocksize=2k --iodepth=256 --allow_file_create=0 > > v6.2: write: IOPS=4328, BW=8724KiB/s > v6.2 (patched): write: IOPS=801k, BW=1565MiB/s > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > --- > > Hi all, > > This survives a couple fstests regression runs (with 4k and 2k block > sizes) and cleans up the code a bit from the RFC, taking a suggestion > from Ritesh to move some of the checks into ext4_dio_write_checks(). Thanks for working on the suggestion and rebasing on top. I liked the way this patch has turned out to be. It's very clear now. Thanks again for the optimization!! Looks good to me. Please feel free to add - Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx>