Re: [cel:topic-shmem-stable-dir-cookies] [shmem] 5fd403eb6c: WARNING:inconsistent_lock_state

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Wed, 12 Apr 2023 21:00:02 +0000

> On Apr 12, 2023, at 3:34 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> 
> On Wed, 2023-04-12 at 19:09 +0000, Chuck Lever III wrote:
>> 
>>> On Apr 12, 2023, at 3:05 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
>>> 
>>> On Wed, 2023-04-12 at 18:03 +0000, Chuck Lever III wrote:
>>>> 
>>>> 
>>>>> On Apr 10, 2023, at 8:36 PM, kernel test robot <yujie.liu@xxxxxxxxx>
>>>>> wrote:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> kernel test robot noticed "WARNING:inconsistent_lock_state" on:
>>>>> 
>>>>> commit: 5fd403eb6c181c63a3aacd55d92b80256a0670cf ("shmem: stable
>>>>> directory cookies")
>>>>> git://git.kernel.org/cgit/linux/kernel/git/cel/linux topic-shmem-
>>>>> stable-dir-cookies
>>>>> 
>>>>> in testcase: boot
>>>>> 
>>>>> compiler: gcc-11
>>>>> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2
>>>>> -m 16G
>>>>> 
>>>>> (please refer to attached dmesg/kmsg for entire log/backtrace)
>>>>> 
>>>>> 
>>>>> If you fix the issue, kindly add following tag
>>>>>> Reported-by: kernel test robot <yujie.liu@xxxxxxxxx>
>>>>>> Link:
>>>>> https://lore.kernel.org/oe-lkp/202304101606.79aea62f-yujie.liu@xxxxxxxxx
>>>>> 
>>>>> 
>>>>> [   21.279213][    C0] WARNING: inconsistent lock state
>>>>> [   21.279668][    C0] 6.3.0-rc5-00001-g5fd403eb6c18 #1 Not tainted
>>>>> [   21.280199][    C0] --------------------------------
>>>>> [   21.280657][    C0] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W}
>>>>> usage.
>>>>> [   21.281238][    C0] swapper/0/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
>>>>> [ 21.281773][ C0] ffff8881102e9b50 (&xa->xa_lock#3){+.?.}-{2:2}, at:
>>>>> xa_destroy (lib/xarray.c:2214)
>>>>> [   21.283140][    C0] {SOFTIRQ-ON-W} state was registered at:
>>>>> [ 21.283640][ C0] __lock_acquire (kernel/locking/lockdep.c:5010) 
>>>>> [ 21.284089][ C0] lock_acquire (kernel/locking/lockdep.c:467
>>>>> kernel/locking/lockdep.c:5671 kernel/locking/lockdep.c:5634)
>>>>> [ 21.284513][ C0] _raw_spin_lock
>>>>> (include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154)
>>>>> [ 21.284937][ C0] shmem_doff_add (include/linux/xarray.h:965
>>>>> mm/shmem.c:2943) 
>>>>> [ 21.285375][ C0] shmem_mknod (mm/shmem.c:3014) 
>>>>> [ 21.285791][ C0] vfs_mknod (fs/namei.c:3916) 
>>>>> [ 21.286195][ C0] devtmpfs_work_loop (drivers/base/devtmpfs.c:228
>>>>> drivers/base/devtmpfs.c:393 drivers/base/devtmpfs.c:408)
>>>>> [ 21.286653][ C0] devtmpfsd (devtmpfs.c:?) 
>>>>> [ 21.287046][ C0] kthread (kernel/kthread.c:376) 
>>>>> [ 21.287441][ C0] ret_from_fork (arch/x86/entry/entry_64.S:314) 
>>>>> [   21.287864][    C0] irq event stamp: 167451
>>>>> [ 21.288264][ C0] hardirqs last enabled at (167450):
>>>>> kasan_quarantine_put (arch/x86/include/asm/irqflags.h:42
>>>>> (discriminator 1) arch/x86/include/asm/irqflags.h:77 (discriminator
>>>>> 1) arch/x86/include/asm/irqflags.h:135 (discriminator 1)
>>>>> mm/kasan/quarantine.c:242 (discriminator 1)) 
>>>>> [ 21.289095][ C0] hardirqs last disabled at (167451):
>>>>> _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:108
>>>>> kernel/locking/spinlock.c:162)
>>>>> [ 21.289969][ C0] softirqs last enabled at (167330): __do_softirq
>>>>> (kernel/softirq.c:415 kernel/softirq.c:600)
>>>>> [ 21.290755][ C0] softirqs last disabled at (167355): irq_exit_rcu
>>>>> (kernel/softirq.c:445 kernel/softirq.c:650 kernel/softirq.c:640
>>>>> kernel/softirq.c:662)
>>>>> [   21.291540][    C0]
>>>>> [   21.291540][    C0] other info that might help us debug this:
>>>>> [   21.292230][    C0]  Possible unsafe locking scenario:
>>>>> [   21.292230][    C0]
>>>>> [   21.292905][    C0]        CPU0
>>>>> [   21.293235][    C0]        ----
>>>>> [   21.293575][    C0]   lock(&xa->xa_lock#3);
>>>>> [   21.293987][    C0]   <Interrupt>
>>>>> [   21.294327][    C0]     lock(&xa->xa_lock#3);
>>>>> [   21.294753][    C0]
>>>>> [   21.294753][    C0]  *** DEADLOCK ***
>>>>> [   21.294753][    C0]
>>>>> [   21.295483][    C0] 1 lock held by swapper/0/0:
>>>>> [ 21.295914][ C0] #0: ffffffff8597a260 (rcu_callback){....}-{0:0},
>>>>> at: rcu_do_batch (kernel/rcu/tree.c:2104)
>>>> 
>>>> It appears that RCU is trying to evict a tmpfs directory inode
>>>> prematurely.
>>>> lockdep catches this because someone else is trying to add an entry to
>>>> it
>>>> while RCU is trying to free it. Classic use-after-free.
>>>> 
>>>> Jeff, the only new iput() in this patch is the one you suggested in
>>>> shmem_symlink(). Are you sure it is needed (and also correct)?
>>>> 
>>> 
>>> The code in your topic-shmem-stable-dir-cookies branch looks correct to
>>> me. After shmem_get_inode, it holds an inode reference and that must be
>>> explicitly put on error, unless you attach it to the dentry (via
>>> d_instantiate).
>>> 
>>> I'm not sure how to interpret this. The log is a bit of a mess. It looks
>>> it ended up in some sort of recursive call into the same xarray due to
>>> an interrupt?
>> 
>> I think it's easier to see if you look at the dmesg.xz that was
>> attached to the original report.
>> 
>> The thing calling xa_destroy is being invoked from i_callback,
>> which is the RCU-deferred "inode destroy" method. It's running
>> in softIRQ context.
>> 
> 
> Right, but why is it trying to add an entry to an xarray that is being
> destroyed?

I think because the directory inode's reference count is too low,
and it's being destroyed too early (ie, while there are still
legitimate users).

> Or maybe it isn't, and lockdep is just confused and is
> classifying the various per-inode xarrays together? I have a hard time
> interpreting these reports sometimes. :-/
> 
>> 
>>> One thing that looks suspicious to me is that this patch has the call to
>>> shmem_doff_map_destroy in free_inode (which is the RCU callback). I
>>> think you probably want to do that in destroy_inode instead since that
>>> involves taking locks and such.
>> 
>> I'll have a look!
>> 
> 
> Cool, I think that's probably safest here. In principle, the xarray
> should be empty when we get to this point so there ought not be much to
> do anyway.

I expected that shmem_free_in_core_inode() was invoked later than
shmem_destroy_inode(), after everything had been completely torn
down.

I'm shooting in the dark with the fix, though. I can't make heads
or tails of the job file that was attached, so I don't have a
reproducer.

--
Chuck Lever