> -----Original Message----- > From: Alex Deucher [mailto:alexdeucher@xxxxxxxxx] > Sent: June-15-16 1:00 PM > To: Alex Williamson > Cc: Maling list - DRI developers; Deucher, Alexander; Rodriguez, Andres; for > 3.8 > Subject: Re: [PATCH] drm/radeon: fix asic initialization for virtualized > environments > > On Wed, Jun 15, 2016 at 12:45 PM, Alex Williamson > <alex.williamson@xxxxxxxxxx> wrote: > > On Wed, 15 Jun 2016 02:23:37 -0400 > > Alex Deucher <alexdeucher@xxxxxxxxx> wrote: > > > >> On Mon, Jun 13, 2016 at 4:10 PM, Alex Williamson > >> <alex.williamson@xxxxxxxxxx> wrote: > >> > On Mon, 13 Jun 2016 15:45:20 -0400 > >> > Alex Deucher <alexdeucher@xxxxxxxxx> wrote: > >> > > >> >> When executing in a PCI passthrough based virtuzliation > >> >> environment, the hypervisor will usually attempt to send a PCIe > >> >> bus reset signal to the ASIC when the VM reboots. In this > >> >> scenario, the card is not correctly initialized, but we still > >> >> consider it to be posted. Therefore, in a passthrough based > >> >> environemnt we should always post the card to guarantee it is in a > good state for driver initialization. > >> >> > >> >> Ported from amdgpu commit: > >> >> amdgpu: fix asic initialization for virtualized environments > >> >> > >> >> Cc: Andres Rodriguez <andres.rodriguez@xxxxxxx> > >> >> Cc: Alex Williamson <alex.williamson@xxxxxxxxxx> > >> >> Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx> > >> >> Cc: stable@xxxxxxxxxxxxxxx > >> >> --- > >> >> drivers/gpu/drm/radeon/radeon_device.c | 21 > +++++++++++++++++++++ > >> >> 1 file changed, 21 insertions(+) > >> > > >> > Thanks, I expect it's an improvement, though it's always a bit > >> > disappointing when a driver starts modifying its behavior based on > >> > what might be a transient feature of the platform, in this case a > >> > hypervisor platform. For instance, why does our bus reset and > >> > video ROM execution result in a different state than a physical > >> > BIOS doing the same? Can't this condition occur regardless of a > >> > hypervisor, > >> > >> Just doing a pci reset is not enough on newer cards. The hw handling > >> pci resets changed in CI and more of the logic moved to the driver. > > > > Gag, please relay my disapproval to your hardware folks. > > > >> That does a limited reset, but not the registers that the driver > >> checks to determine whether or not the asic has been posted so the > >> driver skips posting and leaves the hw in a bad reset state. > >> > >> > perhaps a rare hot-add of a GPU, a bare metal kexec reboot, or > >> > perhaps simply a system BIOS optimized to post a limited set of devices. > >> > >> We can tell if a card has never been posted and properly post it. > >> Where it's tricky is when a card has been posted and has subsequently > >> been pci reset on CI and newer hw. I'm not sure of a good way to > >> detect this particular scenario. Generally this is mainly done for > >> qemu/kvm. > > > > How do you tell if a card has never been posted? Is it something we > > could easily toggle after a bus reset? > > We check CONFIG_MEMSIZE which is a scratch register set by the asic_init > command table to tell the driver how much vram is on the board. > Yeah, detecting a specific virtualization environment is something that I don't really like. Specially since there isn't a nice generic way to do this for all environments (i.e. in this patch I used x86 specific functionality). If there is a generic approach that works on multiple host CPU architectures do let me know. Another approach I considered was adding an option, amdgpu.always_post that would shift the responsibility onto the user. But I don't think it's really fair to have that expectation, and at some point you start falling into config hell. As Alex mentioned, we don't really have a good mechanism for detecting when we need to post due to how the HW handles PCIe bus reset. Hopefully we can get this fixed for future ASICs. > > > >> > Detection based on some state of the device rather than an > >> > expectation based on what the device is running on seems > >> > preferable. I suspect Andres' patch for amdgpu only affects newer > >> > devices, which pretty much all suffer reset issues, at least under > >> > QEMU/VFIO, but I wonder how this patch affects existing working > devices, like 6, 7, and some 8-series. > >> > >> Posting the asic at init time should be safe on all asics. > >> > >> > Anyway, if this is the solution to the poor behavior we've seen > >> > with assigned AMD cards, maybe someone could request the same for > >> > the closed drivers, including Windows. Thanks, > >> > >> The closed drivers already do this. > > > > Hmm, that's not terribly encouraging then since the majority of users > > are running Windows guests for the purpose of creating a gaming VM and > > still experiencing reset issues with the closed drivers there. > > Thanks, > > I'll have to check with the windows team to see how much validation they do > with the windows driver as a qemu/kvm guest. It could be that they don't > properly detect that as a virtual case. > > Alex _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel