On Mon, 06 Jan 2025 12:22:48 +0000, Sibi Sankar <quic_sibis@xxxxxxxxxxx> wrote: > > > > On 12/5/24 21:16, Johan Hovold wrote: > > On Thu, Dec 05, 2024 at 04:53:05PM +0530, Sibi Sankar wrote: > >> On 11/5/24 23:42, Marc Zyngier wrote: > >>> On Tue, 05 Nov 2024 16:57:07 +0000, > >>> Johan Hovold <johan@xxxxxxxxxx> wrote: > >>>> On Fri, Nov 01, 2024 at 02:43:57PM +0000, Marc Zyngier wrote: > > > >>>>> I wonder whether the same sort of reset happen on more "commercial" > >>>>> systems (such as some of the laptops). You expect that people look at > >>>>> the cpufreq stuff closely, and don't see things exploding like we are. > >>>> > >>>> I finally got around to getting my Lenovo ThinkPad T14s to boot (it > >>>> refuses to start the kernel when using GRUB, and it's not due to the > >>>> known 64 GB memory issue as it only has 32 GB) > >>> > >>> <cry> > >>> I know the feeling. My devkit can't use GRUB either, so I added a > >>> hook to the GRUB config to generate EFI scripts that directly execute > >>> the kernel with initrd, dtb, and command line. > >>> > >>> This is probably the worse firmware I've seen in a very long while. > >> > >> The PERF_LEVEL_GET implementation in the SCP firmware side > >> is the reason for the crash :|, currently there is a bug > >> in the kernel that picks up index that we set with LEVEL_SET > >> with fast channel and that masks the crash. I was told the > >> crash happens when idle states are enabled and a regular > >> LEVEL_GET message is triggered from the kernel. This was > >> fixed a while back but it will take a while to flow back > >> to all the devices. It should already be out CRD's. > >> > >> Johan, > >> Now that you are aware of the the limitations can we make > >> a call on how to deal with this and land cpufreq? > > > > As Marc said, it seems you need to come up with a way to detect and work > > around the broken firmware. > > The perf protocol version won't have any changes so detecting > it isn't possible :( This is just... baffling. Can this be checked against one of the strings contained in the DMI tables? > > > > > We want to get the fast channel issue fixed, but when we merge that fix > > it will trigger these crashes if we also merge cpufreq support for x1e. > > > > Can you expand the on the PERF_LEVEL_GET issue? Is it possible to > > implement some workaround for the buggy firmware? Like returning a dummy > > value? How exactly are things working today? Can't that be used a basis > > for a quirk? > > The main problem is the X1E firmware supports fast channel level get > but when queried it says it doesn't support it :|. The PERF_LEVEL_GET > regular messaging which gets used as a fallback has a bug which causes > the device to crash. So we either enable cpufreq only on platforms > that has the fix in place Again: how do we detect this? > or live with the warning that certain messages > don't support fast channel which I don't think will fly. I've also been > told the crash wouldn't show up if we have all sleep states > disabled. So we have the choice between crashing quickly, or sucking power like mad? Thanks, M. -- Without deviation from the norm, progress is not possible.