[LSF/MM/BPF TOPIC] Improving large folio writeback performance

Joanne Koong <joannelkoong@xxxxxxxxx> · Tue, 14 Jan 2025 16:50:53 -0800

Hi all,

I would like to propose a discussion topic about improving large folio
writeback performance. As more filesystems adopt large folios, it
becomes increasingly important that writeback is made to be as
performant as possible. There are two areas I'd like to discuss:

== Granularity of dirty pages writeback ==
Currently, the granularity of writeback is at the folio level. If one
byte in a folio is dirty, the entire folio will be written back. This
becomes unscalable for larger folios and significantly degrades
performance, especially for workloads that employ random writes.

One idea is to track dirty pages at a smaller granularity using a
64-bit bitmap stored inside the folio struct where each bit tracks a
smaller chunk of pages (eg for 2 MB folios, each bit would track 32k
pages), and only write back dirty chunks rather than the entire folio.

== Balancing dirty pages ==
It was observed that the dirty page balancing logic used in
balance_dirty_pages() fails to scale for large folios [1]. For
example, fuse saw around a 125% drop in throughput for writes when
using large folios vs small folios on 1MB block sizes, which was
attributed to scheduled io waits in the dirty page balancing logic. In
generic_perform_write(), dirty pages are balanced after every write to
the page cache by the filesystem. With large folios, each write
dirties a larger number of pages which can grossly exceed the
ratelimit, whereas with small folios each write is one page and so
pages are balanced more incrementally and adheres more closely to the
ratelimit. In order to accomodate large folios, likely the logic in
balancing dirty pages needs to be reworked.

Thanks,
Joanne

[1] https://lore.kernel.org/linux-fsdevel/Z1N505RCcH1dXlLZ@xxxxxxxxxxxxxxxxxxxx/T/#m9e3dd273aa202f9f4e12eb9c96602b5fec2d383d