Regression on linux-next (next-20231130)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Johannes,

Hope you are doing well. I am Chaitanya from the linux graphics team in Intel.

This mail is regarding a regression we are seeing in our CI runs[1] on linux-next repository.

Since the version next-20231130 [2], we are seeing the following regression

 `````````````````````````````````````````````````````````````````````````````````
<4> [198.663557] ======================================================
<4> [198.663559] WARNING: possible circular locking dependency detected
<4> [198.663562] 6.7.0-rc4-next-20231204-next-20231204-g629a3b49f3f9+ #1 Not tainted
<4> [198.663566] ------------------------------------------------------
<4> [198.663568] core_hotunplug/5433 is trying to acquire lock:
<4> [198.663571] ffff8881481b5068 (debugfs:i915_lpsp_capability#7){++++}-{0:0}, at: remove_one+0x56/0x160
<4> [198.663580] 
but task is already holding lock:
<4> [198.663583] ffff88810ef2e9d0 (&sb->s_type->i_mutex_key#2){++++}-{3:3}, at: simple_recursive_removal+0x1a1/0x2e0
<4> [198.663591] 
which lock already depends on the new lock.
<4> [198.663594] 
the existing dependency chain (in reverse order) is:
 `````````````````````````````````````````````````````````````````````````````````
Details log can be found in [3].

Locally we have seen a slightly different version of the issue

[  663.199573] core_hotunplug/1735 is trying to acquire lock:
[  663.199574] ffff888133406e68 (debugfs:i915_pipe){++++}-{0:0}, at: remove_one+0x56/0x160
 
After bisecting the tree, the following patch [4] seems to be the first "bad"
commit

`````````````````````````````````````````````````````````````````````````````````````````````````````````
commit f4acfcd4deb158b96595250cc332901b282d15b0
Author: Johannes Berg johannes.berg@xxxxxxxxx
Date:   Fri Nov 24 17:25:25 2023 +0100

    debugfs: annotate debugfs handlers vs. removal with lockdep

    When you take a lock in a debugfs handler but also try
    to remove the debugfs file under that lock, things can
    deadlock since the removal has to wait for all users
    to finish.

    Add lockdep annotations in debugfs_file_get()/_put()
    to catch such issues.

    Acked-by: Greg Kroah-Hartman gregkh@xxxxxxxxxxxxxxxxxxx
    Signed-off-by: Johannes Berg johannes.berg@xxxxxxxxx

fs/debugfs/file.c     | 10 ++++++++++
fs/debugfs/inode.c    | 12 ++++++++++++
fs/debugfs/internal.h |  6 ++++++
3 files changed, 28 insertions(+)
`````````````````````````````````````````````````````````````````````````````````````````````````````````

We also verified that if we revert the patch the issue is not seen.

Could you please check why the patch causes this regression and provide a fix
if necessary?

Thank you.

Regards

Chaitanya

[1] https://intel-gfx-ci.01.org/tree/linux-next/combined-alt.html?
[2] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20231130
[3] https://intel-gfx-ci.01.org/tree/linux-next/next-20231204/bat-dg2-9/igt@core_hotunplug@xxxxxxxxxxxxxxxxxx
[4] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20231130&id=f4acfcd4deb158b96595250cc332901b282d15b0




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux