[converted to plain text] +intel-gfx Gentle Reminder. From: Borah, Chaitanya Kumar Sent: Wednesday, June 26, 2024 8:52 PM To: sidhartha.kumar@xxxxxxxxxx Cc: Liam.Howlett@xxxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; maple-tree@xxxxxxxxxxxxxxxxxxx; Nikula, Jani <jani.nikula@xxxxxxxxx>; Saarinen, Jani <jani.saarinen@xxxxxxxxx>; Kurmi, Suresh Kumar <Suresh.Kumar.Kurmi@xxxxxxxxx> Subject: Regression on linux-next (next-20240625) Hello Sidhartha, Hope you are doing well. I am Chaitanya from the linux graphics team in Intel. This mail is regarding a regression we are seeing in our CI runs[1] on linux-next repository. Since the version next-20240625 [2], we are seeing the following regression ````````````````````````````````````````````````````````````````````````````````` <3>[ 2.336948] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:337 <3>[ 2.336974] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 95, name: kdevtmpfs <3>[ 2.336989] preempt_count: 1, expected: 0 <3>[ 2.336998] RCU nest depth: 0, expected: 0 <4>[ 2.337006] 3 locks held by kdevtmpfs/95: <4>[ 2.337015] #0: ffff888100d2c3f0 (sb_writers){.+.+}-{0:0}, at: filename_create+0x5d/0x160 <4>[ 2.337041] #1: ffff888100800840 (&type->i_mutex_dir_key/1){+.+.}-{3:3}, at: filename_create+0x9d/0x160 <4>[ 2.337065] #2: ffff888100800658 (&simple_offset_lock_class){+.+.}-{2:2}, at: mtree_alloc_cyclic+0x71/0xf0 <3>[ 2.337089] Preemption disabled at: <3>[ 2.337091] [<0000000000000000>] 0x0 <4>[ 2.337105] CPU: 13 UID: 0 PID: 95 Comm: kdevtmpfs Not tainted 6.10.0-rc5-next-20240625-next-20240625-g0fc4bfab2cd4+ #1 <4>[ 2.337126] Hardware name: ASUS System Product Name/PRIME Z790-P WIFI, BIOS 0812 02/24/2023 <4>[ 2.337141] Call Trace: <4>[ 2.337147] <TASK> <4>[ 2.337152] dump_stack_lvl+0xb0/0xd0 <4>[ 2.337163] __might_resched+0x194/0x2b0 <4>[ 2.337175] kmem_cache_alloc_noprof+0x20c/0x280 <4>[ 2.337186] ? mas_alloc_nodes+0x173/0x230 <4>[ 2.337197] mas_alloc_nodes+0x173/0x230 <4>[ 2.337207] mas_alloc_cyclic+0x27b/0x550 <4>[ 2.337220] mtree_alloc_cyclic+0x92/0xf0 ````````````````````````````````````````````````````````````````````````````````` Details log can be found in [3]. After bisecting the tree, the following patch [4] seems to be the first "bad" commit ````````````````````````````````````````````````````````````````````````````````````````````````````````` maple_tree: remove mas_destroy() from mas_nomem() Separate call to mas_destroy() from mas_nomem() so we can check for no memory errors without destroying the current maple state in mas_store_gfp(). We then add calls to mas_destroy() to callers of mas_nomem(). Link: https://lkml.kernel.org/r/20240618204750.79512-6-sidhartha.kumar@xxxxxxxxxx Signed-off-by: Sidhartha Kumar mailto:sidhartha.kumar@xxxxxxxxxx ````````````````````````````````````````````````````````````````````````````````````````````````````````` We could not revert the patch because of merge conflicts but resetting to the parent of the commit seems to fix the issue. Could you please check why the patch causes this regression and provide a fix if necessary? Thank you. Regards Chaitanya [1] https://intel-gfx-ci.01.org/tree/linux-next/combined-alt.html? [2] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20240625 [3] https://intel-gfx-ci.01.org/tree/linux-next/next-20240625/bat-rpls-4/boot0.txt [4] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=187827d2dc3749d66546696b78584ee4c54687b0