On Wed, Jun 15, 2016 at 12:45 PM, Alex Williamson <alex.williamson@xxxxxxxxxx> wrote: > On Wed, 15 Jun 2016 02:23:37 -0400 > Alex Deucher <alexdeucher@xxxxxxxxx> wrote: > >> On Mon, Jun 13, 2016 at 4:10 PM, Alex Williamson >> <alex.williamson@xxxxxxxxxx> wrote: >> > On Mon, 13 Jun 2016 15:45:20 -0400 >> > Alex Deucher <alexdeucher@xxxxxxxxx> wrote: >> > >> >> When executing in a PCI passthrough based virtuzliation environment, the >> >> hypervisor will usually attempt to send a PCIe bus reset signal to the >> >> ASIC when the VM reboots. In this scenario, the card is not correctly >> >> initialized, but we still consider it to be posted. Therefore, in a >> >> passthrough based environemnt we should always post the card to guarantee >> >> it is in a good state for driver initialization. >> >> >> >> Ported from amdgpu commit: >> >> amdgpu: fix asic initialization for virtualized environments >> >> >> >> Cc: Andres Rodriguez <andres.rodriguez@xxxxxxx> >> >> Cc: Alex Williamson <alex.williamson@xxxxxxxxxx> >> >> Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx> >> >> Cc: stable@xxxxxxxxxxxxxxx >> >> --- >> >> drivers/gpu/drm/radeon/radeon_device.c | 21 +++++++++++++++++++++ >> >> 1 file changed, 21 insertions(+) >> > >> > Thanks, I expect it's an improvement, though it's always a bit >> > disappointing when a driver starts modifying its behavior based on >> > what might be a transient feature of the platform, in this case a >> > hypervisor platform. For instance, why does our bus reset and video >> > ROM execution result in a different state than a physical BIOS doing >> > the same? Can't this condition occur regardless of a hypervisor, >> >> Just doing a pci reset is not enough on newer cards. The hw handling >> pci resets changed in CI and more of the logic moved to the driver. > > Gag, please relay my disapproval to your hardware folks. > >> That does a limited reset, but not the registers that the driver >> checks to determine whether or not the asic has been posted so the >> driver skips posting and leaves the hw in a bad reset state. >> >> > perhaps a rare hot-add of a GPU, a bare metal kexec reboot, or perhaps >> > simply a system BIOS optimized to post a limited set of devices. >> >> We can tell if a card has never been posted and properly post it. >> Where it's tricky is when a card has been posted and has subsequently >> been pci reset on CI and newer hw. I'm not sure of a good way to >> detect this particular scenario. Generally this is mainly done for >> qemu/kvm. > > How do you tell if a card has never been posted? Is it something we > could easily toggle after a bus reset? We check CONFIG_MEMSIZE which is a scratch register set by the asic_init command table to tell the driver how much vram is on the board. > >> > Detection based on some state of the device rather than an expectation >> > based on what the device is running on seems preferable. I suspect >> > Andres' patch for amdgpu only affects newer devices, which pretty much >> > all suffer reset issues, at least under QEMU/VFIO, but I wonder how this >> > patch affects existing working devices, like 6, 7, and some 8-series. >> >> Posting the asic at init time should be safe on all asics. >> >> > Anyway, if this is the solution to the poor behavior we've seen with >> > assigned AMD cards, maybe someone could request the same for the closed >> > drivers, including Windows. Thanks, >> >> The closed drivers already do this. > > Hmm, that's not terribly encouraging then since the majority of users > are running Windows guests for the purpose of creating a gaming VM and > still experiencing reset issues with the closed drivers there. Thanks, I'll have to check with the windows team to see how much validation they do with the windows driver as a qemu/kvm guest. It could be that they don't properly detect that as a virtual case. Alex -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html