On 6/27/24 9:45 PM, Borah, Chaitanya Kumar wrote:
[converted to plain text]
+intel-gfx
Gentle Reminder.
Hello,
This patch will be dropped from mm-unstable and will not be in linux-next after
that. I am working on a fix to include for the next version of this series.
Thanks,
Sid
From: Borah, Chaitanya Kumar
Sent: Wednesday, June 26, 2024 8:52 PM
To: sidhartha.kumar@xxxxxxxxxx
Cc: Liam.Howlett@xxxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; maple-tree@xxxxxxxxxxxxxxxxxxx; Nikula, Jani <jani.nikula@xxxxxxxxx>; Saarinen, Jani <jani.saarinen@xxxxxxxxx>; Kurmi, Suresh Kumar <Suresh.Kumar.Kurmi@xxxxxxxxx>
Subject: Regression on linux-next (next-20240625)
Hello Sidhartha,
Hope you are doing well. I am Chaitanya from the linux graphics team in Intel.
This mail is regarding a regression we are seeing in our CI runs[1] on linux-next repository.
Since the version next-20240625 [2], we are seeing the following regression
`````````````````````````````````````````````````````````````````````````````````
<3>[ 2.336948] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:337
<3>[ 2.336974] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 95, name: kdevtmpfs
<3>[ 2.336989] preempt_count: 1, expected: 0
<3>[ 2.336998] RCU nest depth: 0, expected: 0
<4>[ 2.337006] 3 locks held by kdevtmpfs/95:
<4>[ 2.337015] #0: ffff888100d2c3f0 (sb_writers){.+.+}-{0:0}, at: filename_create+0x5d/0x160
<4>[ 2.337041] #1: ffff888100800840 (&type->i_mutex_dir_key/1){+.+.}-{3:3}, at: filename_create+0x9d/0x160
<4>[ 2.337065] #2: ffff888100800658 (&simple_offset_lock_class){+.+.}-{2:2}, at: mtree_alloc_cyclic+0x71/0xf0
<3>[ 2.337089] Preemption disabled at:
<3>[ 2.337091] [<0000000000000000>] 0x0
<4>[ 2.337105] CPU: 13 UID: 0 PID: 95 Comm: kdevtmpfs Not tainted 6.10.0-rc5-next-20240625-next-20240625-g0fc4bfab2cd4+ #1
<4>[ 2.337126] Hardware name: ASUS System Product Name/PRIME Z790-P WIFI, BIOS 0812 02/24/2023
<4>[ 2.337141] Call Trace:
<4>[ 2.337147] <TASK>
<4>[ 2.337152] dump_stack_lvl+0xb0/0xd0
<4>[ 2.337163] __might_resched+0x194/0x2b0
<4>[ 2.337175] kmem_cache_alloc_noprof+0x20c/0x280
<4>[ 2.337186] ? mas_alloc_nodes+0x173/0x230
<4>[ 2.337197] mas_alloc_nodes+0x173/0x230
<4>[ 2.337207] mas_alloc_cyclic+0x27b/0x550
<4>[ 2.337220] mtree_alloc_cyclic+0x92/0xf0
`````````````````````````````````````````````````````````````````````````````````
Details log can be found in [3].
After bisecting the tree, the following patch [4] seems to be the first "bad"
commit
`````````````````````````````````````````````````````````````````````````````````````````````````````````
maple_tree: remove mas_destroy() from mas_nomem()
Separate call to mas_destroy() from mas_nomem() so we can check for no
memory errors without destroying the current maple state in
mas_store_gfp(). We then add calls to mas_destroy() to callers of
mas_nomem().
Link: https://lkml.kernel.org/r/20240618204750.79512-6-sidhartha.kumar@xxxxxxxxxx
Signed-off-by: Sidhartha Kumar mailto:sidhartha.kumar@xxxxxxxxxx
`````````````````````````````````````````````````````````````````````````````````````````````````````````
We could not revert the patch because of merge conflicts but resetting to the parent of the commit seems to fix the issue.
Could you please check why the patch causes this regression and provide a fix if necessary?
Thank you.
Regards
Chaitanya
[1] https://intel-gfx-ci.01.org/tree/linux-next/combined-alt.html?
[2] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20240625
[3] https://intel-gfx-ci.01.org/tree/linux-next/next-20240625/bat-rpls-4/boot0.txt
[4] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=187827d2dc3749d66546696b78584ee4c54687b0