Re: [PATCH] drm/radeon: fix asic initialization for virtualized environments

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 13, 2016 at 4:10 PM, Alex Williamson
<alex.williamson@xxxxxxxxxx> wrote:
> On Mon, 13 Jun 2016 15:45:20 -0400
> Alex Deucher <alexdeucher@xxxxxxxxx> wrote:
>
>> When executing in a PCI passthrough based virtuzliation environment, the
>> hypervisor will usually attempt to send a PCIe bus reset signal to the
>> ASIC when the VM reboots. In this scenario, the card is not correctly
>> initialized, but we still consider it to be posted. Therefore, in a
>> passthrough based environemnt we should always post the card to guarantee
>> it is in a good state for driver initialization.
>>
>> Ported from amdgpu commit:
>> amdgpu: fix asic initialization for virtualized environments
>>
>> Cc: Andres Rodriguez <andres.rodriguez@xxxxxxx>
>> Cc: Alex Williamson <alex.williamson@xxxxxxxxxx>
>> Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>
>> Cc: stable@xxxxxxxxxxxxxxx
>> ---
>>  drivers/gpu/drm/radeon/radeon_device.c | 21 +++++++++++++++++++++
>>  1 file changed, 21 insertions(+)
>
> Thanks, I expect it's an improvement, though it's always a bit
> disappointing when a driver starts modifying its behavior based on
> what might be a transient feature of the platform, in this case a
> hypervisor platform.  For instance, why does our bus reset and video
> ROM execution result in a different state than a physical BIOS doing
> the same?  Can't this condition occur regardless of a hypervisor,

Just doing a pci reset is not enough on newer cards.  The hw handling
pci resets changed in CI and more of the logic moved to the driver.
That does a limited reset, but not the registers that the driver
checks to determine whether or not the asic has been posted so the
driver skips posting and leaves the hw in a bad reset state.

> perhaps a rare hot-add of a GPU, a bare metal kexec reboot, or perhaps
> simply a system BIOS optimized to post a limited set of devices.

We can tell if a card has never been posted and properly post it.
Where it's tricky is when a card has been posted and has subsequently
been pci reset on CI and newer hw.  I'm not sure of a good way to
detect this particular scenario.  Generally this is mainly done for
qemu/kvm.

> Detection based on some state of the device rather than an expectation
> based on what the device is running on seems preferable.  I suspect
> Andres' patch for amdgpu only affects newer devices, which pretty much
> all suffer reset issues, at least under QEMU/VFIO, but I wonder how this
> patch affects existing working devices, like 6, 7, and some 8-series.

Posting the asic at init time should be safe on all asics.

> Anyway, if this is the solution to the poor behavior we've seen with
> assigned AMD cards, maybe someone could request the same for the closed
> drivers, including Windows.  Thanks,

The closed drivers already do this.

Alex

>
> Alex
>
>> diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
>> index e61c763..21c44b2 100644
>> --- a/drivers/gpu/drm/radeon/radeon_device.c
>> +++ b/drivers/gpu/drm/radeon/radeon_device.c
>> @@ -630,6 +630,23 @@ void radeon_gtt_location(struct radeon_device *rdev, struct radeon_mc *mc)
>>  /*
>>   * GPU helpers function.
>>   */
>> +
>> +/**
>> + * radeon_device_is_virtual - check if we are running is a virtual environment
>> + *
>> + * Check if the asic has been passed through to a VM (all asics).
>> + * Used at driver startup.
>> + * Returns true if virtual or false if not.
>> + */
>> +static bool radeon_device_is_virtual(void)
>> +{
>> +#ifdef CONFIG_X86
>> +     return boot_cpu_has(X86_FEATURE_HYPERVISOR);
>> +#else
>> +     return false;
>> +#endif
>> +}
>> +
>>  /**
>>   * radeon_card_posted - check if the hw has already been initialized
>>   *
>> @@ -643,6 +660,10 @@ bool radeon_card_posted(struct radeon_device *rdev)
>>  {
>>       uint32_t reg;
>>
>> +     /* for pass through, always force asic_init */
>> +     if (radeon_device_is_virtual())
>> +             return false;
>> +
>>       /* required for EFI mode on macbook2,1 which uses an r5xx asic */
>>       if (efi_enabled(EFI_BOOT) &&
>>           (rdev->pdev->subsystem_vendor == PCI_VENDOR_ID_APPLE) &&
>
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]