On Wed, Dec 06, 2023 at 05:05:37PM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@xxxxxxxxxx> > > Scalability of the global inode_hash_lock really sucks for > filesystems that use the vfs inode cache (i.e. everything but XFS). Ages ago, we talked about (and I attempted, but ended up swearing at inode lifetime rules) - conversion to rhashtable instead, which I still believe would be preferable since that code is fully lockless (and resizeable, of course). But it turned out to be a much bigger project... But IIRC the bulk of the work was going to be "clean up inode refcounting/lifetime rules into something sane/modern" - maybe we could leave some breadcrumbs/comments in fs/inode.c for what that would take, if/when someone else is sufficiently motivated? > threads vanilla patched vanilla patched > 2 7.923 7.358 8.003 7.276 > 4 8.152 7.530 9.097 8.506 > 8 13.090 7.871 11.752 10.015 > 16 24.602 9.540 24.614 13.989 > 32 49.536 19.314 49.179 25.982 nice > The big wins here are at >= 8 threads, with both filesytsems now > being limited by internal filesystem algorithms, not the VFS inode > cache scalability. > > Ext4 contention moves to the buffer cache on directory block > lookups: > > - 66.45% 0.44% [kernel] [k] __ext4_read_dirblock > - 66.01% __ext4_read_dirblock > - 66.01% ext4_bread > - ext4_getblk > - 64.77% bdev_getblk > - 64.69% __find_get_block > - 63.01% _raw_spin_lock > - 62.96% do_raw_spin_lock > 59.21% __pv_queued_spin_lock_slowpath > > bcachefs contention moves to internal btree traversal locks. > > - 95.37% __lookup_slow > - 93.95% bch2_lookup > - 82.57% bch2_vfs_inode_get > - 65.44% bch2_inode_find_by_inum_trans > - 65.41% bch2_inode_peek_nowarn > - 64.60% bch2_btree_iter_peek_slot > - 64.55% bch2_btree_path_traverse_one > - bch2_btree_path_traverse_cached > - 63.02% bch2_btree_path_traverse_cached_slowpath > - 56.60% mutex_lock dlist-lock ought to be perfect for solving this one Reviewed-by: Kent Overstreet <kent.overstreet@xxxxxxxxx>