On 10-Jul-24 6:34 PM, Mateusz Guzik wrote:
However the contention now has shifted to inode_hash_lock. Around 55
softlockups in ilookup() were observed:
# tracer: preemptirqsoff
#
# preemptirqsoff latency trace v1.1.5 on 6.10.0-rc3-trnmglru
# --------------------------------------------------------------------
# latency: 10620430 us, #4/4, CPU#260 | (M:desktop VP:0, KP:0, SP:0 HP:0
#P:512)
# -----------------
# | task: fio-3244715 (uid:0 nice:0 policy:0 rt_prio:0)
# -----------------
# => started at: ilookup
# => ended at: ilookup
#
#
# _------=> CPU#
# / _-----=> irqs-off/BH-disabled
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| / _-=> migrate-disable
# ||||| / delay
# cmd pid |||||| time | caller
# \ / |||||| \ | /
fio-3244715 260...1. 0us$: _raw_spin_lock <-ilookup
fio-3244715 260.N.1. 10620429us : _raw_spin_unlock <-ilookup
fio-3244715 260.N.1. 10620430us : tracer_preempt_on <-ilookup
fio-3244715 260.N.1. 10620440us : <stack trace>
=> _raw_spin_unlock
=> ilookup
=> blkdev_get_no_open
=> blkdev_open
=> do_dentry_open
=> vfs_open
=> path_openat
=> do_filp_open
=> do_sys_openat2
=> __x64_sys_openat
=> x64_sys_call
=> do_syscall_64
=> entry_SYSCALL_64_after_hwframe
It appears that scalability issues with inode_hash_lock has been brought
up multiple times in the past and there were patches to address the same.
https://lore.kernel.org/all/20231206060629.2827226-9-david@xxxxxxxxxxxxx/
https://lore.kernel.org/lkml/20240611173824.535995-2-mjguzik@xxxxxxxxx/
CC'ing FS folks/list for awareness/comments.
Note my patch does not enable RCU usage in ilookup, but this can be
trivially added.
I can't even compile-test at the moment, but the diff below should do
it. Also note the patches are present here
https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=vfs.inode.rcu
, not yet integrated anywhere.
That said, if fio you are operating on the same target inode every
time then this is merely going to shift contention to the inode
spinlock usage in find_inode_fast.
diff --git a/fs/inode.c b/fs/inode.c
index ad7844ca92f9..70b0e6383341 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1524,10 +1524,14 @@ struct inode *ilookup(struct super_block *sb,
unsigned long ino)
{
struct hlist_head *head = inode_hashtable + hash(sb, ino);
struct inode *inode;
+
again:
- spin_lock(&inode_hash_lock);
- inode = find_inode_fast(sb, head, ino, true);
- spin_unlock(&inode_hash_lock);
+ inode = find_inode_fast(sb, head, ino, false);
+ if (IS_ERR_OR_NULL_PTR(inode)) {
+ spin_lock(&inode_hash_lock);
+ inode = find_inode_fast(sb, head, ino, true);
+ spin_unlock(&inode_hash_lock);
+ }
if (inode) {
if (IS_ERR(inode))
I think I expressed myself poorly, so here is take two:
1. inode hash soft lookup should get resolved if you apply
https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs.inode.rcu&id=7180f8d91fcbf252de572d9ffacc945effed0060
and the above pasted fix (not compile tested tho, but it should be
obvious what the intended fix looks like)
2. find_inode_hash spinlocks the target inode. if your bench only
operates on one, then contention is going to shift there and you may
still be getting soft lockups. not taking the spinlock in this
codepath is hackable, but I don't want to do it without a good
justification.
Thanks Mateusz for the fix. With this patch applied, the above mentioned
contention in ilookup() has not been observed for a test run during the
weekend.
Regards,
Bharata.