Re: [PATCH V7 0/2] qcom: x1e80100: Enable CPUFreq

Marc Zyngier <maz@xxxxxxxxxx> · Mon, 06 Jan 2025 14:57:29 +0000

On Mon, 06 Jan 2025 12:22:48 +0000,
Sibi Sankar <quic_sibis@xxxxxxxxxxx> wrote:
> 
> 
> 
> On 12/5/24 21:16, Johan Hovold wrote:
> > On Thu, Dec 05, 2024 at 04:53:05PM +0530, Sibi Sankar wrote:
> >> On 11/5/24 23:42, Marc Zyngier wrote:
> >>> On Tue, 05 Nov 2024 16:57:07 +0000,
> >>> Johan Hovold <johan@xxxxxxxxxx> wrote:
> >>>> On Fri, Nov 01, 2024 at 02:43:57PM +0000, Marc Zyngier wrote:
> > 
> >>>>> I wonder whether the same sort of reset happen on more "commercial"
> >>>>> systems (such as some of the laptops). You expect that people look at
> >>>>> the cpufreq stuff closely, and don't see things exploding like we are.
> >>>> 
> >>>> I finally got around to getting my Lenovo ThinkPad T14s to boot (it
> >>>> refuses to start the kernel when using GRUB, and it's not due to the
> >>>> known 64 GB memory issue as it only has 32 GB)
> >>> 
> >>> <cry>
> >>> I know the feeling. My devkit can't use GRUB either, so I added a
> >>> hook to the GRUB config to generate EFI scripts that directly execute
> >>> the kernel with initrd, dtb, and command line.
> >>> 
> >>> This is probably the worse firmware I've seen in a very long while.
> >> 
> >> The PERF_LEVEL_GET implementation in the SCP firmware side
> >> is the reason for the crash :|, currently there is a bug
> >> in the kernel that picks up index that we set with LEVEL_SET
> >> with fast channel and that masks the crash. I was told the
> >> crash happens when idle states are enabled and a regular
> >> LEVEL_GET message is triggered from the kernel. This was
> >> fixed a while back but it will take a while to flow back
> >> to all the devices. It should already be out CRD's.
> >> 
> >> Johan,
> >> Now that you are aware of the the limitations can we make
> >> a call on how to deal with this and land cpufreq?
> > 
> > As Marc said, it seems you need to come up with a way to detect and work
> > around the broken firmware.
> 
> The perf protocol version won't have any changes so detecting
> it isn't possible :(

This is just... baffling. Can this be checked against one of the
strings contained in the DMI tables?

> 
> > 
> > We want to get the fast channel issue fixed, but when we merge that fix
> > it will trigger these crashes if we also merge cpufreq support for x1e.
> > 
> > Can you expand the on the PERF_LEVEL_GET issue? Is it possible to
> > implement some workaround for the buggy firmware? Like returning a dummy
> > value? How exactly are things working today? Can't that be used a basis
> > for a quirk?
> 
> The main problem is the X1E firmware supports fast channel level get
> but when queried it says it doesn't support it :|. The PERF_LEVEL_GET
> regular messaging which gets used as a fallback has a bug which causes
> the device to crash. So we either enable cpufreq only on platforms
> that has the fix in place

Again: how do we detect this?

> or live with the warning that certain messages
> don't support fast channel which I don't think will fly. I've also been
> told the crash wouldn't show up if we have all sleep states
> disabled.

So we have the choice between crashing quickly, or sucking power like
mad?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.