On Tue, May 16, 2023 at 12:17:04PM -0400, Kent Overstreet wrote: > On Tue, May 16, 2023 at 05:45:19PM +0200, Christian Brauner wrote: > > On Wed, May 10, 2023 at 02:45:57PM +1000, Dave Chinner wrote: > > There's a bit of a backlog before I get around to looking at this but > > it'd be great if we'd have a few reviewers for this change. > > It is well tested - it's been in the bcachefs tree for ages with zero > issues. I'm pulling it out of the bcachefs-prerequisites series though > since Dave's still got it in his tree, he's got a newer version with > better commit messages. > > It's a significant performance boost on metadata heavy workloads for any > non-XFS filesystem, we should definitely get it in. I've got an up to date vfs-scale tree here (6.4-rc1) but I have not been able to test it effectively right now because my local performance test server is broken. I'll do what I can on the old small machine that I have to validate it when I get time, but that might be a few weeks away.... git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git vfs-scale As it is, the inode hash-bl changes have zero impact on XFS because it has it's own highly scalable lockless, sharded inode cache. So unless I'm explicitly testing ext4 or btrfs scalability (rare) it's not getting a lot of scalability exercise. It is being used by the root filesytsems on all those test VMs, but that's about it... That said, my vfs-scale tree also has Waiman Long's old dlist code (per cpu linked list) which converts the sb inode list and removes the global lock there. This does make a huge impact for XFS - the current code limits inode cache cycling to about 600,000 inodes/sec on >=16p machines. With dlists, however: | 5.17.0 on a XFS filesystem with 50 million inodes in it on a 32p | machine with a 1.6MIOPS/6.5GB/s block device. | | Fully concurrent full filesystem bulkstat: | | wall time sys time IOPS BW rate | unpatched: 1m56.035s 56m12.234s 8k 200MB/s 0.4M/s | patched: 0m15.710s 3m45.164s 70k 1.9GB/s 3.4M/s | | Unpatched flat kernel profile: | | 81.97% [kernel] [k] __pv_queued_spin_lock_slowpath | 1.84% [kernel] [k] do_raw_spin_lock | 1.33% [kernel] [k] __raw_callee_save___pv_queued_spin_unlock | 0.50% [kernel] [k] memset_erms | 0.42% [kernel] [k] do_raw_spin_unlock | 0.42% [kernel] [k] xfs_perag_get | 0.40% [kernel] [k] xfs_buf_find | 0.39% [kernel] [k] __raw_spin_lock_init | | Patched flat kernel profile: | | 10.90% [kernel] [k] do_raw_spin_lock | 7.21% [kernel] [k] __raw_callee_save___pv_queued_spin_unlock | 3.16% [kernel] [k] xfs_buf_find | 3.06% [kernel] [k] rcu_segcblist_enqueue | 2.73% [kernel] [k] memset_erms | 2.31% [kernel] [k] __pv_queued_spin_lock_slowpath | 2.15% [kernel] [k] __raw_spin_lock_init | 2.15% [kernel] [k] do_raw_spin_unlock | 2.12% [kernel] [k] xfs_perag_get | 1.93% [kernel] [k] xfs_btree_lookup Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx