On Mon, Jun 13, 2016 at 4:10 PM, Alex Williamson <alex.williamson@xxxxxxxxxx> wrote: > On Mon, 13 Jun 2016 15:45:20 -0400 > Alex Deucher <alexdeucher@xxxxxxxxx> wrote: > >> When executing in a PCI passthrough based virtuzliation environment, the >> hypervisor will usually attempt to send a PCIe bus reset signal to the >> ASIC when the VM reboots. In this scenario, the card is not correctly >> initialized, but we still consider it to be posted. Therefore, in a >> passthrough based environemnt we should always post the card to guarantee >> it is in a good state for driver initialization. >> >> Ported from amdgpu commit: >> amdgpu: fix asic initialization for virtualized environments >> >> Cc: Andres Rodriguez <andres.rodriguez@xxxxxxx> >> Cc: Alex Williamson <alex.williamson@xxxxxxxxxx> >> Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx> >> Cc: stable@xxxxxxxxxxxxxxx >> --- >> drivers/gpu/drm/radeon/radeon_device.c | 21 +++++++++++++++++++++ >> 1 file changed, 21 insertions(+) > > Thanks, I expect it's an improvement, though it's always a bit > disappointing when a driver starts modifying its behavior based on > what might be a transient feature of the platform, in this case a > hypervisor platform. For instance, why does our bus reset and video > ROM execution result in a different state than a physical BIOS doing > the same? Can't this condition occur regardless of a hypervisor, Just doing a pci reset is not enough on newer cards. The hw handling pci resets changed in CI and more of the logic moved to the driver. That does a limited reset, but not the registers that the driver checks to determine whether or not the asic has been posted so the driver skips posting and leaves the hw in a bad reset state. > perhaps a rare hot-add of a GPU, a bare metal kexec reboot, or perhaps > simply a system BIOS optimized to post a limited set of devices. We can tell if a card has never been posted and properly post it. Where it's tricky is when a card has been posted and has subsequently been pci reset on CI and newer hw. I'm not sure of a good way to detect this particular scenario. Generally this is mainly done for qemu/kvm. > Detection based on some state of the device rather than an expectation > based on what the device is running on seems preferable. I suspect > Andres' patch for amdgpu only affects newer devices, which pretty much > all suffer reset issues, at least under QEMU/VFIO, but I wonder how this > patch affects existing working devices, like 6, 7, and some 8-series. Posting the asic at init time should be safe on all asics. > Anyway, if this is the solution to the poor behavior we've seen with > assigned AMD cards, maybe someone could request the same for the closed > drivers, including Windows. Thanks, The closed drivers already do this. Alex > > Alex > >> diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c >> index e61c763..21c44b2 100644 >> --- a/drivers/gpu/drm/radeon/radeon_device.c >> +++ b/drivers/gpu/drm/radeon/radeon_device.c >> @@ -630,6 +630,23 @@ void radeon_gtt_location(struct radeon_device *rdev, struct radeon_mc *mc) >> /* >> * GPU helpers function. >> */ >> + >> +/** >> + * radeon_device_is_virtual - check if we are running is a virtual environment >> + * >> + * Check if the asic has been passed through to a VM (all asics). >> + * Used at driver startup. >> + * Returns true if virtual or false if not. >> + */ >> +static bool radeon_device_is_virtual(void) >> +{ >> +#ifdef CONFIG_X86 >> + return boot_cpu_has(X86_FEATURE_HYPERVISOR); >> +#else >> + return false; >> +#endif >> +} >> + >> /** >> * radeon_card_posted - check if the hw has already been initialized >> * >> @@ -643,6 +660,10 @@ bool radeon_card_posted(struct radeon_device *rdev) >> { >> uint32_t reg; >> >> + /* for pass through, always force asic_init */ >> + if (radeon_device_is_virtual()) >> + return false; >> + >> /* required for EFI mode on macbook2,1 which uses an r5xx asic */ >> if (efi_enabled(EFI_BOOT) && >> (rdev->pdev->subsystem_vendor == PCI_VENDOR_ID_APPLE) && > -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html