RE: Regression on linux-next (next-20240625)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[converted to plain text]
+intel-gfx

Gentle Reminder.

From: Borah, Chaitanya Kumar 
Sent: Wednesday, June 26, 2024 8:52 PM
To: sidhartha.kumar@xxxxxxxxxx
Cc: Liam.Howlett@xxxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; maple-tree@xxxxxxxxxxxxxxxxxxx; Nikula, Jani <jani.nikula@xxxxxxxxx>; Saarinen, Jani <jani.saarinen@xxxxxxxxx>; Kurmi, Suresh Kumar <Suresh.Kumar.Kurmi@xxxxxxxxx>
Subject: Regression on linux-next (next-20240625)

Hello Sidhartha,

Hope you are doing well. I am Chaitanya from the linux graphics team in Intel.

This mail is regarding a regression we are seeing in our CI runs[1] on linux-next repository.

Since the version next-20240625 [2], we are seeing the following regression

`````````````````````````````````````````````````````````````````````````````````
<3>[    2.336948] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:337
<3>[    2.336974] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 95, name: kdevtmpfs
<3>[    2.336989] preempt_count: 1, expected: 0
<3>[    2.336998] RCU nest depth: 0, expected: 0
<4>[    2.337006] 3 locks held by kdevtmpfs/95:
<4>[    2.337015]  #0: ffff888100d2c3f0 (sb_writers){.+.+}-{0:0}, at: filename_create+0x5d/0x160
<4>[    2.337041]  #1: ffff888100800840 (&type->i_mutex_dir_key/1){+.+.}-{3:3}, at: filename_create+0x9d/0x160
<4>[    2.337065]  #2: ffff888100800658 (&simple_offset_lock_class){+.+.}-{2:2}, at: mtree_alloc_cyclic+0x71/0xf0
<3>[    2.337089] Preemption disabled at:
<3>[    2.337091] [<0000000000000000>] 0x0
<4>[    2.337105] CPU: 13 UID: 0 PID: 95 Comm: kdevtmpfs Not tainted 6.10.0-rc5-next-20240625-next-20240625-g0fc4bfab2cd4+ #1
<4>[    2.337126] Hardware name: ASUS System Product Name/PRIME Z790-P WIFI, BIOS 0812 02/24/2023
<4>[    2.337141] Call Trace:
<4>[    2.337147]  <TASK>
<4>[    2.337152]  dump_stack_lvl+0xb0/0xd0
<4>[    2.337163]  __might_resched+0x194/0x2b0
<4>[    2.337175]  kmem_cache_alloc_noprof+0x20c/0x280
<4>[    2.337186]  ? mas_alloc_nodes+0x173/0x230
<4>[    2.337197]  mas_alloc_nodes+0x173/0x230
<4>[    2.337207]  mas_alloc_cyclic+0x27b/0x550
<4>[    2.337220]  mtree_alloc_cyclic+0x92/0xf0
`````````````````````````````````````````````````````````````````````````````````
Details log can be found in [3].

After bisecting the tree, the following patch [4] seems to be the first "bad"
commit

`````````````````````````````````````````````````````````````````````````````````````````````````````````
    maple_tree: remove mas_destroy() from mas_nomem()

    Separate call to mas_destroy() from mas_nomem() so we can check for no
    memory errors without destroying the current maple state in
    mas_store_gfp().  We then add calls to mas_destroy() to callers of
    mas_nomem().

    Link: https://lkml.kernel.org/r/20240618204750.79512-6-sidhartha.kumar@xxxxxxxxxx
    Signed-off-by: Sidhartha Kumar mailto:sidhartha.kumar@xxxxxxxxxx

`````````````````````````````````````````````````````````````````````````````````````````````````````````

We could not revert the patch because of merge conflicts but resetting to the parent of the commit seems to fix the issue.

Could you please check why the patch causes this regression and provide a fix if necessary?

Thank you.

Regards

Chaitanya

[1] https://intel-gfx-ci.01.org/tree/linux-next/combined-alt.html?
[2] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20240625
[3] https://intel-gfx-ci.01.org/tree/linux-next/next-20240625/bat-rpls-4/boot0.txt 
[4] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=187827d2dc3749d66546696b78584ee4c54687b0




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux