Re: [linux-next:master] [maple_tree] 2041864a22: BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Jaeseon Sim <jason.sim@xxxxxxxxxxx> [230926 08:15]:
...
> >>>>>> commit: 2041864a22d4f4e900d0a3def4985432a21d8e6d ("maple_tree: use mas_node_count_gfp() in mas_expected_entries()")
> >>>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >>>>>>
> >>>>>> [test failed on linux-next/master 940fcc189c51032dd0282cbee4497542c982ac59]
> >>>>>>
> >>>>>> in testcase: boot
> >>>>>>
> >>>>>> compiler: gcc-9
> >>>>>> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> >>>>>>
> >>>>>> (please refer to attached dmesg/kmsg for entire log/backtrace)
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> >>>>>> the same patch/commit), kindly add following tags
> >>>>>> | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
> >>>>>> | Closes: https://lore.kernel.org/oe-lkp/202309242123.7ebe65b5-oliver.sang@xxxxxxxxx
> >>>>>>
> >>>>>>
> >>>>>> [  113.582828][    T1] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:306
> >>>>>> [  113.583602][    T1] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
> >>>>>> [  113.584246][    T1] preempt_count: 1, expected: 0
> >>>>>> [  113.584613][    T1] RCU nest depth: 0, expected: 0
> >>>>>> [  113.584983][    T1] 1 lock held by swapper/0/1:
> >>>>>> [ 113.585344][ T1] #0: ffffc9000001fc10 (&mt->ma_lock){+.+.}-{2:2}, at: check_forking+0x1e0/0x5c0
> >>>>> Dear Liam,
> >>>>>
> >>>>> mas_expected_entries() in check_forking() tried to sleep while holding spinlock, and panic occurred.
> >>>>> I think mas_expected_entries() in lib/test_maple_tree.c need to be modified to align with commit 2041864a22d4f.
> >>>>> Do you have any idea for it? or Could you give some guide?
> >>> 
> >>> There are two ways we could fix this: one is to pass through the GFP
> >>> flag and use different flags in the test module, the other is to move
> >>> the testing out of the module and into the userspace tests.
> >>Actually, there is a third method that can be used to solve this
> >>problem, which is to use an externally sleepable lock, such as
> >>rw_semaphore.

Oh yes, that's true.  There's probably even more ways than even these
three.  I didn't mean to limit the choices to just two.

We should probably also add a test for this somehow to the test suite,
so probably the best way would be to use the test module and implement
the change like Peng suggested - at least in the long run.  That way the
bots will report the potential locking issue if we ever change it again.

> >>> 
> >>> Adding the GFP flag to the interface might be needed in the future but
> >>> there's no need for that now.  I was concerned about too large of a
> >>> change to the existing code, and this would increase the runtime code
> >>> changes - although not a lot.
> >>> 
> >>> I think the best thing would be to move the forking test out of the
> >>> module into the userspace testing (tools/testing/radix-tree/maple.c)
> >>> 
> >>>> This is just a test module. The work[1] I'm doing modifies this place
> >>>> and it will fix this bug.
> >>> 
> >>> Thanks Peng.  This is a temporary fix for upstream, but is needed for
> >>> the LTS kernels as well.  I've mentioned your patches to others, so
> >>> don't think they aren't noticed - they are eagerly awaited.
> >>> 
> >>> Since your patch adds the necessary GFP flag, we could move the
> >>> check_forking test back in your update, (patch 7/9 [1]) which avoids the
> >>> GFP_KERNEL flag (thanks!), if it is moved.  I think it's worth while to
> >>> do since you already have a lot of userspace tests as well that uses
> >>> GFP_KERNEL (4/9 [2]) and it's good to keep as much in the kernel module
> >>> as possible.
> >>> 
> >>> By the way Peng, I have gotten complaints (I cannot find a reference
> >>> quickly) from older CPUs taking a long time on the test module.  You are
> >>> making things faster, but I just wanted you to be aware of that in case
> >>> you add tests in the future that cause complaints :)  I still think it
> >>> is worth keeping as much as possible in that module - it's a more valid
> >>> test scenario and it still runs from the userspace testing.
> >>I understand this now, and I will take this into consideration when
> >>adding tests in the future.
> >>> 
> >>> ...
> >>> 
> >>>> Thanks.
> >>>>
> >>>> [1] https://lore.kernel.org/lkml/20230925035617.84767-1-zhangpeng.00@xxxxxxxxxxxxx/
> >>> 
> >>> ...
> >>> 
> >>> Thanks,
> >>> Liam
> >>> 
> >>> [1] https://lore.kernel.org/lkml/20230925035617.84767-8-zhangpeng.00@xxxxxxxxxxxxx/
> >>> [2] https://lore.kernel.org/lkml/20230925035617.84767-5-zhangpeng.00@xxxxxxxxxxxxx/
> >>> 
> >>> 
> 
> I think it would be better to wait for Peng's revision..
> 

Your patch is a bug fix.  What Peng has written is a new feature.  The
only reason that the new feature fixes the issue is that I raised it in
the patch review process because you reported the issue in the existing
code.  So, thank you for reporting it!

We are not backporting a new feature to an older kernel, so the fix you
submitted is incomplete, but very much necessary. For all we know, there
will be other issues with the feature that will delay getting to Linus'
branch - and possibly miss your internal release window as well.

We are currently in an RC release of the kernel.  This means we can
submit the bug fixes, but we won't be submitting new features. So your
fix will land and be backported all the way to 6.1 before Peng's feature
is added.

Even if the new feature was put into the mm branch today, it would take
a long time for it to be pushed upstream.  Even when the new feature is
landed upstream, your fix is still necessary from 6.1 - 6.6.

Between Peng and I, we have provided 3 ways to work around the test
failure.  I like Peng's suggestion the best, but since he will be
replacing the tests then we could make this change in his patchset.  For
now, I would be happy with a passing test framework to minimize the
backporting effort.

Do you think you can update the patch to incorporate one of suggested
fixes for the testing?  If not, let me know and I will add it to the
list of tasks, because it *has* to happen.

Thank you,
Liam





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux