[RFC PATCH 0/3] vfs: convert inode cache to hlist-bl

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 6 Apr 2021 22:33:40 +1000

Hi folks,

Recently I've been doing some scalability characterisation of
various filesystems, and one of the limiting factors that has
prevented me from exploring filesystem characteristics is the
inode hash table. namely, the global inode_hash_lock that protects
it.

This has long been a problem, but I personally haven't cared about
it because, well, XFS doesn't use it and so it's not a limiting
factor for most of my work. However, in trying to characterise the
scalability boundaries of bcachefs, I kept hitting against VFS
limitations first. bcachefs hits the inode hash table pretty hard
and it becaomse a contention point a lot sooner than it does for
ext4. Btrfs also uses the inode hash, but it's namespace doesn't
have the capability to stress the indoe hash lock due to it hitting
internal contention first.

Long story short, I did what should have been done a decade or more
ago - I converted the inode hash table to use hlist-bl to split up
the global lock. This is modelled on the dentry cache, with one
minor tweak. That is, the inode hash value cannot be calculated from
the inode, so we have to keep a record of either the hash value or a
pointer to the hlist-bl list head that the inode is hashed into so
taht we can lock the corect list on removal.

Other than that, this is mostly just a mechanical conversion from
one list and lock type to another. None of the algorithms have
changed and none of the RCU behaviours have changed. But it removes
the inode_hash_lock from the picture and so performance for bcachefs
goes way up and CPU usage for ext4 halves at 16 and 32 threads. At
higher thread counts, we start to hit filesystem and other VFS locks
as the limiting factors. Profiles and performance numbers are in
patch 3 for those that are curious.

I've been running this in benchmarks and perf testing across
bcachefs, btrfs and ext4 for a couple of weeks, and it passes
fstests on ext4 and btrfs without regressions. So now it needs more
eyes and testing and hopefully merging....

Cheers,

Dave.