On Tue, Nov 12, 2024 at 6:14 PM Borah, Chaitanya Kumar <chaitanya.kumar.borah@xxxxxxxxx> wrote: > > > > > -----Original Message----- > > From: Rafael J. Wysocki <rafael@xxxxxxxxxx> > > Sent: Monday, November 11, 2024 6:58 PM > > To: Borah, Chaitanya Kumar <chaitanya.kumar.borah@xxxxxxxxx> > > Cc: Wysocki, Rafael J <rafael.j.wysocki@xxxxxxxxx>; intel- > > gfx@xxxxxxxxxxxxxxxxxxxxx; Kurmi, Suresh Kumar > > <suresh.kumar.kurmi@xxxxxxxxx>; Saarinen, Jani <jani.saarinen@xxxxxxxxx>; > > Nikula, Jani <jani.nikula@xxxxxxxxx>; linux-pm@xxxxxxxxxxxxxxx; > > srinivas.pandruvada@xxxxxxxxxxxxxxx; ricardo.neri-calderon@xxxxxxxxxxxxxxx > > Subject: Re: Regression on linux-next (next-20241106) > > > > Hi Chaitanya, > > > > On Mon, Nov 11, 2024 at 6:41 AM Borah, Chaitanya Kumar > > <chaitanya.kumar.borah@xxxxxxxxx> wrote: > > > > > > Hello Rafael, > > > > > > Hope you are doing well. I am Chaitanya from the linux graphics team in > > Intel. > > > > > > This mail is regarding a regression we are seeing in our CI runs[1] on linux- > > next repository. > > > > > > Since the version next-20241106 [2], we are seeing the following > > > regression > > > > > > ````````````````````````````````````````````````````````````````````````````````` > > > <4>[ 7.246473] WARNING: possible circular locking dependency detected > > > <4>[ 7.246476] 6.12.0-rc6-next-20241106-next-20241106-g5b913f5d7d7f+ > > #1 Not tainted > > > <4>[ 7.246479] ------------------------------------------------------ > > > <4>[ 7.246481] swapper/0/1 is trying to acquire lock: > > > <4>[ 7.246483] ffffffff8264aef0 (cpu_hotplug_lock){++++}-{0:0}, at: > > static_key_enable+0xd/0x20 > > > <4>[ 7.246493] > > > but task is already holding lock: > > > <4>[ 7.246495] ffffffff82832068 (hybrid_capacity_lock){+.+.}-{4:4}, at: > > intel_pstate_register_driver+0xd3/0x1c0 > > > `````````````````````````````````````````````````````````````````````` > > > ``````````` > > > Details log can be found in [3]. > > > > Thanks for the report! > > > > > After bisecting the tree, the following patch [4] seems to be the first "bad" > > > commit > > > > > > `````````````````````````````````````````````````````````````````````` > > > ``````````````````````````````````` > > > commit 92447aa5f6e7fbad9427a3fd1bb9e0679c403206 > > > Author: Rafael J. Wysocki mailto:rafael.j.wysocki@xxxxxxxxx > > > Date: Mon Nov 4 19:53:53 2024 +0100 > > > > > > cpufreq: intel_pstate: Update asym capacity for CPUs that were > > > offline initially > > > `````````````````````````````````````````````````````````````````````` > > > ``````````````````````````````````` > > > > > > We also verified that if we revert the patch the issue is not seen. > > > > > > Could you please check why the patch causes this regression and provide a > > fix if necessary? > > > > > > [1] https://intel-gfx-ci.01.org/tree/linux-next/combined-alt.html? > > > [2] > > > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/co > > > mmit/?h=next-20241106 [3] > > > https://intel-gfx-ci.01.org/tree/linux-next/next-20241106/bat-arls-1/b > > > oot0.txt [4] > > > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/co > > > mmit/?h=next- > > 20241106&id=92447aa5f6e7fbad9427a3fd1bb9e0679c403206 > > > > The problem is that cpus_read_lock() should not be called under > > hybrid_capacity_lock because the latter is acquired in CPU online/offline > > paths and this is exposed by the above commit, but if I'm not mistaken, the > > issue is there regardless of it. > > > > A good news is that is should be addressed by a patch that has been posted > > already: > > > > https://lore.kernel.org/linux-pm/12554508.O9o76ZdvQC@xxxxxxxxxxxxx/ > > > > so please let me know if it makes the splat go away. > > > > Even if its changelog says that it has no functional impact, this is not really the > > case. > > > > Thanks! > > Thank you Rafael for the patch, we can confirm that it helps. Thanks for checking and letting me know, much appreciated!