On Thu 20-03-25 01:46:43, Mateusz Guzik wrote: > As both locks are highly contended during significant inode churn, > holding the inode hash lock while waiting for the sb list lock > exacerbates the problem. > > Why moving it out is safe: the inode at hand still has I_NEW set and > anyone who finds it through legitimate means waits for the bit to clear, > by which time inode_sb_list_add() is guaranteed to have finished. > > This significantly drops hash lock contention for me when stating 20 > separate trees in parallel, each with 1000 directories * 1000 files. > > However, no speed up was observed as contention increased on the other > locks, notably dentry LRU. > > Even so, removal of the lock ordering will help making this faster > later. > > Signed-off-by: Mateusz Guzik <mjguzik@xxxxxxxxx> I agree I_NEW will be protecting the inode from being looked up through the hash while we drop inode_hash_lock so this is safe. I'm usually a bit reluctant to performance "improvements" that actually don't bring a measurable benefit but this patch looks like a no-brainer (I'm not getting worried about added complexity :)) and at least it reduces contention on inode_hash_lock so I agree the potential is there. So feel free to add: Reviewed-by: Jan Kara <jack@xxxxxxx> Honza > --- > > There were ideas about using bitlocks, which ran into trouble because of > spinlocks being taken *after* and RT kernel not liking that. > > I'm thinking thanks to RCU this very much can be hacked around to > reverse the hash-specific lock -> inode lock: you find the inode you > are looking for, lock it and only then take the bitlock and validate it > remained in place. > > The above patch removes an impediment -- the only other lock possibly > taken with the hash thing. > > Even if the above idea does not pan out, the patch has some value in > decoupling these. > > I am however not going to strongly argue for it given lack of results. > > fs/inode.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/fs/inode.c b/fs/inode.c > index e188bb1eb07a..8efd38c27321 100644 > --- a/fs/inode.c > +++ b/fs/inode.c > @@ -1307,8 +1307,8 @@ struct inode *inode_insert5(struct inode *inode, unsigned long hashval, > } > > if (set && unlikely(set(inode, data))) { > - inode = NULL; > - goto unlock; > + spin_unlock(&inode_hash_lock); > + return NULL; > } > > /* > @@ -1320,14 +1320,14 @@ struct inode *inode_insert5(struct inode *inode, unsigned long hashval, > hlist_add_head_rcu(&inode->i_hash, head); > spin_unlock(&inode->i_lock); > > + spin_unlock(&inode_hash_lock); > + > /* > * Add inode to the sb list if it's not already. It has I_NEW at this > * point, so it should be safe to test i_sb_list locklessly. > */ > if (list_empty(&inode->i_sb_list)) > inode_sb_list_add(inode); > -unlock: > - spin_unlock(&inode_hash_lock); > > return inode; > } > @@ -1456,8 +1456,8 @@ struct inode *iget_locked(struct super_block *sb, unsigned long ino) > inode->i_state = I_NEW; > hlist_add_head_rcu(&inode->i_hash, head); > spin_unlock(&inode->i_lock); > - inode_sb_list_add(inode); > spin_unlock(&inode_hash_lock); > + inode_sb_list_add(inode); > > /* Return the locked inode with I_NEW set, the > * caller is responsible for filling in the contents > -- > 2.43.0 > -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR