On Tue, Jan 14, 2025 at 5:21 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Tue, Jan 14, 2025 at 04:50:53PM -0800, Joanne Koong wrote: > > Hi all, > > > > I would like to propose a discussion topic about improving large folio > > writeback performance. As more filesystems adopt large folios, it > > becomes increasingly important that writeback is made to be as > > performant as possible. There are two areas I'd like to discuss: > > > > > > == Granularity of dirty pages writeback == > > Currently, the granularity of writeback is at the folio level. If one > > byte in a folio is dirty, the entire folio will be written back. This > > becomes unscalable for larger folios and significantly degrades > > performance, especially for workloads that employ random writes. > > This sounds familiar, probably because we fixed this exact issue in > the iomap infrastructure some while ago. > > commit 4ce02c67972211be488408c275c8fbf19faf29b3 > Author: Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx> > Date: Mon Jul 10 14:12:43 2023 -0700 > > iomap: Add per-block dirty state tracking to improve performance > > When filesystem blocksize is less than folio size (either with > mapping_large_folio_support() or with blocksize < pagesize) and when the > folio is uptodate in pagecache, then even a byte write can cause > an entire folio to be written to disk during writeback. This happens > because we currently don't have a mechanism to track per-block dirty > state within struct iomap_folio_state. We currently only track uptodate > state. > > This patch implements support for tracking per-block dirty state in > iomap_folio_state->state bitmap. This should help improve the filesystem > write performance and help reduce write amplification. > > Performance testing of below fio workload reveals ~16x performance > improvement using nvme with XFS (4k blocksize) on Power (64K pagesize) > FIO reported write bw scores improved from around ~28 MBps to ~452 MBps. > > 1. <test_randwrite.fio> > [global] > ioengine=psync > rw=randwrite > overwrite=1 > pre_read=1 > direct=0 > bs=4k > size=1G > dir=./ > numjobs=8 > fdatasync=1 > runtime=60 > iodepth=64 > group_reporting=1 > > [fio-run] > > 2. Also our internal performance team reported that this patch improves > their database workload performance by around ~83% (with XFS on Power) > > Reported-by: Aravinda Herle <araherle@xxxxxxxxxx> > Reported-by: Brian Foster <bfoster@xxxxxxxxxx> > Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx> > Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > One idea is to track dirty pages at a smaller granularity using a > > 64-bit bitmap stored inside the folio struct where each bit tracks a > > smaller chunk of pages (eg for 2 MB folios, each bit would track 32k > > pages), and only write back dirty chunks rather than the entire folio. > > Have a look at how sub-folio state is tracked via the > folio->iomap_folio_state->state{} bitmaps. > > Essentially it is up to the subsystem to track sub-folio state if > they require it; there is some generic filesystem infrastructure > support already in place (like iomap), but if that doesn't fit a > filesystem then it will need to provide it's own dirty/uptodate > tracking.... Great, thanks for the info. I'll take a look at how the iomap layer does this. > > -Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx