On Tue, 28 Feb 2023 09:33:53 +0200 Tasos Sahanidis <tasos@xxxxxxxxxxxx> wrote: > Thank you very much for your quick response, Abhishek. > > > 1. Set disable_idle_d3 module parameter set and check if this issue happens. > The issue does not happen with disable_idle_d3, which means I can at > least now use newer kernels. All the following commands were ran > *without* disable_idle_d3, so that the issue would occur. > > > 2. Without starting the VM, check the status of following sysfs entries. > I assume by /sys/bus/pci/devices/<B:D:F>/power/power_state you meant > /sys/bus/pci/devices/<B:D:F>/power_state, as the former doesn't exist. > > # cat /sys/bus/pci/devices/0000\:06\:00.0/power/runtime_status > suspended > # cat /sys/bus/pci/devices/0000\:06\:00.0/power_state > D3hot > > > 3. After issue happens, run the above command again. > This is with the VM running and the errors in dmesg: > > # cat /sys/bus/pci/devices/0000\:06\:00.0/power/runtime_status > active > # cat /sys/bus/pci/devices/0000\:06\:00.0/power_state > D0 Can you do the same for the root port to the GPU, ex. use lspci -t to find the parent root port. Since the device doesn't seem to be achieving D3cold (expected on a desktop system), the other significant change of the identified commit is that the root port will also enter a low power state. Prior to that commit the device would enter D3hot, but we never touched the root port. Perhaps confirm the root port now enters D3hot and compare lspci for the root port when using disable_idle_d3 to that found when trying to use the device without disable_idle_d3. Thanks, Alex