On Tue, Jan 14, 2025 at 04:50:53PM -0800, Joanne Koong wrote: > Hi all, > > I would like to propose a discussion topic about improving large folio > writeback performance. As more filesystems adopt large folios, it > becomes increasingly important that writeback is made to be as > performant as possible. There are two areas I'd like to discuss: > > > == Granularity of dirty pages writeback == > Currently, the granularity of writeback is at the folio level. If one > byte in a folio is dirty, the entire folio will be written back. This > becomes unscalable for larger folios and significantly degrades > performance, especially for workloads that employ random writes. This sounds familiar, probably because we fixed this exact issue in the iomap infrastructure some while ago. commit 4ce02c67972211be488408c275c8fbf19faf29b3 Author: Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx> Date: Mon Jul 10 14:12:43 2023 -0700 iomap: Add per-block dirty state tracking to improve performance When filesystem blocksize is less than folio size (either with mapping_large_folio_support() or with blocksize < pagesize) and when the folio is uptodate in pagecache, then even a byte write can cause an entire folio to be written to disk during writeback. This happens because we currently don't have a mechanism to track per-block dirty state within struct iomap_folio_state. We currently only track uptodate state. This patch implements support for tracking per-block dirty state in iomap_folio_state->state bitmap. This should help improve the filesystem write performance and help reduce write amplification. Performance testing of below fio workload reveals ~16x performance improvement using nvme with XFS (4k blocksize) on Power (64K pagesize) FIO reported write bw scores improved from around ~28 MBps to ~452 MBps. 1. <test_randwrite.fio> [global] ioengine=psync rw=randwrite overwrite=1 pre_read=1 direct=0 bs=4k size=1G dir=./ numjobs=8 fdatasync=1 runtime=60 iodepth=64 group_reporting=1 [fio-run] 2. Also our internal performance team reported that this patch improves their database workload performance by around ~83% (with XFS on Power) Reported-by: Aravinda Herle <araherle@xxxxxxxxxx> Reported-by: Brian Foster <bfoster@xxxxxxxxxx> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx> Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx> > One idea is to track dirty pages at a smaller granularity using a > 64-bit bitmap stored inside the folio struct where each bit tracks a > smaller chunk of pages (eg for 2 MB folios, each bit would track 32k > pages), and only write back dirty chunks rather than the entire folio. Have a look at how sub-folio state is tracked via the folio->iomap_folio_state->state{} bitmaps. Essentially it is up to the subsystem to track sub-folio state if they require it; there is some generic filesystem infrastructure support already in place (like iomap), but if that doesn't fit a filesystem then it will need to provide it's own dirty/uptodate tracking.... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx