On Wed, Mar 13, 2024 at 05:22:33PM +0100, Marek Szyprowski wrote: > On 13.03.2024 15:35, Sudeep Holla wrote: > > On Tue, Mar 12, 2024 at 05:55:49PM +0000, Catalin Marinas wrote: > >> On Tue, Mar 12, 2024 at 10:06:06AM -0700, Christoph Lameter (Ampere) wrote: > >>> On Mon, 11 Mar 2024, Christoph Lameter (Ampere) wrote: > >>> > >>>> This could be an issue in the ARM64 arch code itself where there maybe > >>>> an assumption elsewhere that a cpumask can always store up to NR_CPU > >>>> cpus and not only nr_cpu_ids as OFFSTACK does. > >>>> > >>>> How can I exercise the opp driver in order to recreate the problem? > >>>> > >>>> I assume the opp driver is ARM specific? x86 defaults to OFFSTACK so if > >>>> there is an issue with OFFSTACK in opp then it should fail with kernel > >>>> default configuration on that platform. > >>> I checked the ARM64 arch sources use of NR_CPUS and its all fine. > >>> > >>> Also verified in my testing logs that CONFIG_PM_OPP was set in all tests. > >>> > >>> No warnings in the kernel log during those tests. > >>> > >>> How to reproduce this? > >> I guess you need a platform with a dts that has an "operating-points-v2" > >> property. I don't have any around. > >> > >> Sudeep was trying to trigger this code path earlier, not sure where he > >> got to. > > I did try to trigger this on FVP by adding OPPs + some hacks to add dummy > > clock provider to successfully probe this driver. I couldn't hit the issue > > reported 🙁. It could be that with the hardware clock/regulator drivers, it > > take a different path in OPP core. > > I can fully reproduce this issue on Khadas VIM3 and Odroid-N2 boards. > Both Meson A311D SoC based. So, if I'm reading the OPP code and the DTS* files for Khadas VIM3 correctly, these use operating-points-v2, which is parsed by the opp layer. If the opp layer is unable to parse any operating points, it should print "no supported OPPs" and remove the table (thereby preventing the code in question being reached.) So, I wonder whether what you're seeing is a latent bug which is being tickled by the presence of the CPU masks being off-stack changing the kernel timing. I would suggest the printk debug approach may help here to see when the OPPs are begun to be parsed, when they're created etc and their timing relationship to being used. Given the suspicion, it's possible that the mere addition of printk() may "fix" the problem, which again would be another semi-useful data point. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!