Hi Chaitanya, On Mon, Nov 11, 2024 at 6:41 AM Borah, Chaitanya Kumar <chaitanya.kumar.borah@xxxxxxxxx> wrote: > > Hello Rafael, > > Hope you are doing well. I am Chaitanya from the linux graphics team in Intel. > > This mail is regarding a regression we are seeing in our CI runs[1] on linux-next repository. > > Since the version next-20241106 [2], we are seeing the following regression > > ````````````````````````````````````````````````````````````````````````````````` > <4>[ 7.246473] WARNING: possible circular locking dependency detected > <4>[ 7.246476] 6.12.0-rc6-next-20241106-next-20241106-g5b913f5d7d7f+ #1 Not tainted > <4>[ 7.246479] ------------------------------------------------------ > <4>[ 7.246481] swapper/0/1 is trying to acquire lock: > <4>[ 7.246483] ffffffff8264aef0 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_enable+0xd/0x20 > <4>[ 7.246493] > but task is already holding lock: > <4>[ 7.246495] ffffffff82832068 (hybrid_capacity_lock){+.+.}-{4:4}, at: intel_pstate_register_driver+0xd3/0x1c0 > ````````````````````````````````````````````````````````````````````````````````` > Details log can be found in [3]. Thanks for the report! > After bisecting the tree, the following patch [4] seems to be the first "bad" > commit > > ````````````````````````````````````````````````````````````````````````````````````````````````````````` > commit 92447aa5f6e7fbad9427a3fd1bb9e0679c403206 > Author: Rafael J. Wysocki mailto:rafael.j.wysocki@xxxxxxxxx > Date: Mon Nov 4 19:53:53 2024 +0100 > > cpufreq: intel_pstate: Update asym capacity for CPUs that were offline initially > ````````````````````````````````````````````````````````````````````````````````````````````````````````` > > We also verified that if we revert the patch the issue is not seen. > > Could you please check why the patch causes this regression and provide a fix if necessary? > > [1] https://intel-gfx-ci.01.org/tree/linux-next/combined-alt.html? > [2] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20241106 > [3] https://intel-gfx-ci.01.org/tree/linux-next/next-20241106/bat-arls-1/boot0.txt > [4] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20241106&id=92447aa5f6e7fbad9427a3fd1bb9e0679c403206 The problem is that cpus_read_lock() should not be called under hybrid_capacity_lock because the latter is acquired in CPU online/offline paths and this is exposed by the above commit, but if I'm not mistaken, the issue is there regardless of it. A good news is that is should be addressed by a patch that has been posted already: https://lore.kernel.org/linux-pm/12554508.O9o76ZdvQC@xxxxxxxxxxxxx/ so please let me know if it makes the splat go away. Even if its changelog says that it has no functional impact, this is not really the case. Thanks!