Here is Peter Wu's reply, which was not send to the mailing list, because I had to resend my e-mail to him due to a failure... -------- Forwarded Message -------- Subject: Re: Fwd: Re: Kernel Freeze with American Megatrends BIOS Date: Wed, 31 Aug 2016 18:08:53 +0200 From: Peter Wu <peter@xxxxxxxxxxxxx> To: Roland Singer <roland.singer@xxxxxxxxxxxxx> On Wed, Aug 31, 2016 at 05:56:18PM +0200, Roland Singer wrote: > > If you look at my notes.txt, you will see that _OFF always executes the > > same code. PGON differs. When the problem occurs, "Q0L0" somehow always > > reads back as non-zero and LNKS < 7. > > > > Oh you're Lekensteyn ^^ Yes, that's me :) I wrote bbswitch, did the Optimus and PR3 ACPI support in nouveau so I am fairly certain what happens behind the scenes. > I don't have LNKS and no while loop after calling LKEN ?! Yes that is what I said in https://www.spinics.net/lists/linux-pci/msg53694.html: "Other affected devices have similar code, differences are small: No check for LNKS (avoids the infinite loop, but device is still off)" > >> > >> I noticed following: > >> > >> 1. Blacklist nouveau > >> 2. Boot to GDM login manager (Wayland) > >> 3. Switch to TTY with CTRL+ALT+FN2 > >> 4. Load bbswitch > >> 5. Switch off GPU > >> 6. run lspci -> no freeze > >> 7. Switch to GDM > >> 8. Login to a Wayland session (X11 won't work) > >> 9. run lspci in a GUI terminal -> system freezes > > > > Is nouveau somehow loaded anyway? All those extra components (X11, > > Wayland, etc.) are unnecessary to reproduce the core problem. It occurs > > whenever the device is being resumed (either via DSM/_PS0 or via power > > resource PG00._ON). > > > > Sorry that was nonsense. The steps to reproduce the problem are still valid. > I didn't wait enough to power it down... > > But whats interesting: > > 1. Blacklist nouveau > 2. Load bbswitch > 3. Power off GPU with bbswitch > 4. Power on GPU with bbswitch > 5. Run lspci > 6. Power off GPU with bbswitch > 7. Run lspci -> freeze > > So setting the GPU power state with bbswitch works as expected. > Powering it on is also fine. I did this a couple of times. > But powering it off and letting lspci powering it on, ends in a race. In some cases I also found that it does always happen at the first try, but with nouveau it always seem to happen. > It might be, that lspci does not only power the GPU on, but triggers > another pci action which causes the race condition. > Does this have something to do with your quote about the retrain bit? That is an interesting hypothesis. Even if you invoke `lspci -s01:00.0` for example, it will always probe for all devices. So maybe interaction with its parent device (PCI root port 00:02.0) causes issues. However I also tested without lspci before, and the problem still exists. You can trigger runtime resume via (as root): echo > /sys/bus/pci/0000:01:00.0/power/control on Set it to "auto" to make it sleep again. -- Kind regards, Peter Wu https://lekensteyn.nl -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html