On Fri, Nov 22, 2019 at 12:34 PM Karol Herbst <kherbst@xxxxxxxxxx> wrote: > > On Fri, Nov 22, 2019 at 12:30 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote: > > [cut] > > > > the issue is not AML related at all as I am able to reproduce this > issue without having to invoke any of that at all, I just need to poke > into the PCI register directly to cut the power. Since the register is not documented, you don't actually know what exactly happens when it is written to. You basically are saying something like "if I write a specific value to an undocumented register, that makes things fail". And yes, writing things to undocumented registers is likely to cause failure to happen, in general. The point is that the kernel will never write into this register by itself. > The register is not documented, but effectively what the AML code is writing to as well. So that AML code is problematic. It expects the write to do something useful, but that's not the case. Without the AML, the register would not have been written to at all. > Of course it might also be that the code I was testing it was doing > things in a non conformant way and I just hit a different issue as > well, but in the end I don't think that the AML code is the root cause > of all of that. If AML is not involved at all, things work. You've just said so in another message in this thread, quoting verbatim: "yes. In my previous testing I was poking into the PCI registers of the bridge controller and the GPU directly and that never caused any issues as long as I limited it to putting the devices into D3hot." You cannot claim a hardware bug just because a write to an undocumented register from AML causes things to break. First, that may be a bug in the AML (which is not unheard of). Second, and that is more likely, the expectations of the AML code may not be met at the time it is run. Assuming the latter, the root cause is really that the kernel executes the AML in a hardware configuration in which the expectations of that AML are not met. We are now trying to understand what those expectations may be and so how to cause them to be met. Your observation that the issue can be avoided if the GPU is not put into D3hot by a PMCSR write is a step in that direction and it is a good finding. The information from Mika based on the ASL analysis is helpful too. Let's not jump to premature conclusions too quickly, though.