Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Parallelizing filesystem writeback

Kundan Kumar <kundan.kumar@xxxxxxxxxxx> · Thu, 20 Feb 2025 19:49:22 +0530

Well, that's currently selected by __inode_attach_wb() based on
whether there is a memcg attached to the folio/task being dirtied or
not. If there isn't a cgroup based writeback task, then it uses the
bdi->wb as the wb context.

We have created a proof of concept for per-AG context-based writeback, as
described in [1]. The AG is mapped to a writeback context (wb_ctx). Using
the filesystem handler, __mark_inode_dirty() selects writeback context
corresponding to the inode.

We attempted to handle memcg and bdi based writeback in a similar manner.
This approach aims to maintain the original writeback semantics while
providing parallelism. This helps in pushing more data early to the
device, trying to ease the write pressure faster.
[1] https://lore.kernel.org/all/20250212103634.448437-1-kundan.kumar@xxxxxxxxxxx/

Then selecting inodes for writeback becomes a list_lru_walk()
variant depending on what needs to be written back (e.g. physical
node, memcg, both, everything that is dirty everywhere, etc).

We considered using list_lru to track inodes within a writeback context.
This can be implemented as:
struct bdi_writeback {
 struct list_lru b_dirty_inodes_lru; // instead of a single b_dirty list
 struct list_lru b_io_dirty_inodes_lru;
 ...
 ...
};
By doing this, we would obtain a sharded list of inodes per NUMA node.
However, we would also need per-NUMA writeback contexts. Otherwise,
even if inodes are NUMA-sharded, a single writeback context would stil
process them sequentially, limiting parallelism. But there’s a concern:
NUMA-based writeback contexts are not aligned with filesystem geometry,
which could negatively impact delayed allocation and writeback efficiency,
as you pointed out in your previous reply [2].

Would it be better to let the filesystem dictate the number of writeback
threads, rather than enforcing a per-NUMA model?

Do you see it differently?

[2] https://lore.kernel.org/all/Z5qw_1BOqiFum5Dn@xxxxxxxxxxxxxxxxxxx/