On Wed, Jan 29, 2025 at 03:56:27PM +0530, Kundan Kumar wrote: > and b_more_io lists have also been modified to be per-CPU. When an inode needs > to be added to the b_dirty list, we select the next CPU (in a round-robin > fashion) and schedule the per-CPU writeback work on the selected CPU. I don't think per-cpu is the right shard here. You want to write related data together. A fіrst approximation might be inodes. FYI, a really good "benchmark" is if you can use this parallel writeback code to replace the btrfs workqueue threads spawned to handle checksumming and compression.