On Fri, 01 Nov 2024 13:00:37 +0000, Johan Hovold <johan@xxxxxxxxxx> wrote: > > [ +CC: Marc, who I think I saw reporting something similar even if I can > seem to find where right now ] It was on IRC. > > On Wed, Oct 30, 2024 at 06:38:38PM +0530, Sibi Sankar wrote: > > This series enables CPUFreq support on the X1E SoC using the SCMI perf > > protocol. This was originally part of the RFC: firmware: arm_scmi: > > Qualcomm Vendor Protocol [1]. I've split it up so that this part can > > land earlier. Warnings Introduced by the series are fixed by [2] > > Sibi Sankar (2): > > arm64: dts: qcom: x1e80100: Add cpucp mailbox and sram nodes > > arm64: dts: qcom: x1e80100: Enable cpufreq > > I've been running with v6 of these for a while now, without noticing any > issues, and just updated to v7 to be able to provide a Tested-by tag. > > I wanted to run a compilation and see how the frequencies varied, but > before I got around to that I just grepped the cpufreq sysfs attributes > for CPU0 four times. And this triggered a reset of the machine (x1e80100 > CRD). > > The last values output were: > > affected_cpus:0 1 2 3 > cpuinfo_cur_freq:<unknown> > cpuinfo_max_freq:3417600 > cpuinfo_min_freq:710400 > cpuinfo_transition_latency:30000 > related_cpus:0 1 2 3 > scaling_available_frequencies:710400 806400 998400 1190400 1440000 1670400 1920000 2188800 2515200 2707200 2976000 320 > scaling_available_governors:ondemand userspace performance schedutil > scaling_cur_freq:806400 > scaling_driver:scmi > scaling_governor:schedutil > scaling_max_freq:3417600 > scaling_min_freq:710400 > scaling_setspeed:<unsupported> > > Notice the <unknown> current frequency (the previous greps said 710400 > and 2515200). > > The last thing I see on the serial console, presumably just before > the reset, is: > > [ 196.268025] arm-scmi arm-scmi.0.auto: timed out in resp(caller: do_xfer+0x164/0x564) > > I just rebooted and grepped again and it triggered on the first attempt > (cur_freq also said '<unknown>'). Same error in the log, printed when > grepping. I'm seeing similar things indeed. Randomly grepping in cpufreq/policy* results in hard resets, although I don't get much on the serial console when that happens. Interestingly, I also see some errors in dmesg at boot time: maz@semi-fraudulent:~$ dmesg| grep -i scmi [ 0.966175] scmi_core: SCMI protocol bus registered [ 7.929710] arm-scmi arm-scmi.2.auto: Using scmi_mailbox_transport [ 7.939059] arm-scmi arm-scmi.2.auto: SCMI max-rx-timeout: 30ms [ 7.945567] arm-scmi arm-scmi.2.auto: SCMI RAW Mode initialized for instance 0 [ 7.958348] arm-scmi arm-scmi.2.auto: SCMI RAW Mode COEX enabled ! [ 7.978303] arm-scmi arm-scmi.2.auto: SCMI Notifications - Core Enabled. [ 7.985351] arm-scmi arm-scmi.2.auto: SCMI Protocol v2.0 'Qualcomm:' Firmware version 0x20000 [ 8.033774] arm-scmi arm-scmi.2.auto: Failed to add opps_by_lvl at 3801600 for NCC - ret:-16 [ 8.033902] arm-scmi arm-scmi.2.auto: Failed to add opps_by_lvl at 3801600 for NCC - ret:-16 [ 8.036528] arm-scmi arm-scmi.2.auto: Failed to add opps_by_lvl at 3801600 for NCC - ret:-16 [ 8.036744] arm-scmi arm-scmi.2.auto: Failed to add opps_by_lvl at 3801600 for NCC - ret:-16 [ 8.171232] scmi-perf-domain scmi_dev.4: Initialized 3 performance domains All these "Failed" are a bit worrying. Happy to put any theory to the test. Thanks, M. -- Without deviation from the norm, progress is not possible.