Re: Couple of issues with amdgpu on my WX4100

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 06.01.21 um 21:21 schrieb Maxim Levitsky:
On Mon, 2021-01-04 at 09:45 -0700, Alex Williamson wrote:
On Mon, 4 Jan 2021 12:34:34 +0100
Christian König <christian.koenig@xxxxxxx> wrote:

Hi Maxim,

I can't help with the display related stuff. Probably best approach to
get this fixes would be to open up a bug tracker for this on FDO.

But I'm the one who implemented the resizeable BAR support and your
analysis of the problem sounds about correct to me.

The reason why this works on Linux is most likely because we restore the
BAR size on resume (and maybe during initial boot as well).

See this patch for reference:

commit d3252ace0bc652a1a244455556b6a549f969bf99
Author: Christian König <ckoenig.leichtzumerken@xxxxxxxxx>
Date:   Fri Jun 29 19:54:55 2018 -0500

      PCI: Restore resized BAR state on resume

      Resize BARs after resume to the expected size again.

      BugLink: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D199959&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C04878f8babc64386353908d8b280a23b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637455612845286179%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=iRk9S4IgfQHZgVf1m1n%2F9LpOQzO41pLoc7EWmzH%2Fym4%3D&amp;reserved=0
      Fixes: d6895ad39f3b ("drm/amdgpu: resize VRAM BAR for CPU access v6")
      Fixes: 276b738deb5b ("PCI: Add resizable BAR infrastructure")
      Signed-off-by: Christian König <christian.koenig@xxxxxxx>
      Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
      CC: stable@xxxxxxxxxxxxxxx      # v4.15+

Hi!
Thanks for the feedback!
So I went over qemu code and indeed the qemu (as opposed to the kernel
where I tried to hide the PCI_EXT_CAP_ID_REBAR) indeed does hide this
pci capability from the guest.
However exactly as Alex mentioned the kernel does indeed restore
the rebar state, and even with that code patched out I found out that
rebar state persists across the reset that the vendor_reset module
does (BACO I think).
Therefore the Linux guest sees the full 4G bar and happily uses it,
while the windows guest's driver apparently has a bug when the bar
is that large.
I patched the amdgpu to resize the bar to various other sizes, and
the windows driver apparently works up to a 2GB bar.
So pretty much other than a bug in the windows driver, and fact
that VFIO doesn't support resizable bars there is nothing wrong here.
Since my system does support above 4G decoding and I do have a nice
vfio friendly device that does support a resizable bar, I do volunteer
to add support for this to VFIO as time and resources permit.
Also it would be nice if it was either possible to make amdgpu
(or the whole system) optionally avoid resizing bars when a
kernel command line / module param is given,
or even better let the amdgpu resize the bar to its original
size when it is unloaded which IMHO is the best solution
for this problem.
I think I can prepare a patch to make amdgpu restore
the bar size on unload if you think that
this is the right solution.

Coming back to this topic now, sorry been a bit busy over the last few days.

Basically I don't think that amdgpu should do anything when it quits.

What you should rather do is to resize the BAR to the default value of the BIOS when you trigger the device reset.

It should be trivial to add this to the reset module as well. Most
likely even completely vendor independent since I'm not sure what a bus
reset will do to this configuration and restoring it all the time should
be the most defensive approach.
Hmm, this should already be used by the bus/slot reset path:

pci_bus_restore_locked()/pci_slot_restore_locked()
  pci_dev_restore()
   pci_restore_state()
    pci_restore_rebar_state()

VFIO support for resizeable BARs has been on my todo list, but I don't
have access to any systems that have both a capable device and >4G
decoding enabled in the BIOS.  If we have a consistent view of the BAR
size after the BARs are expanded, I'm not sure why it doesn't just
work.  FWIW, QEMU currently hides the REBAR capability to the guest
because the kernel driver doesn't support emulation through config
space (ie. it's read-only, which the spec doesn't support).

AIUI, resource allocation can fail when enabling REBAR support, which
is a problem if the failure occurs on the host but not the guest since
we have no means via the hardware protocol to expose such a condition.
Therefore the model I was considering for vfio-pci would be to simply
pre-enable REBAR at the max size.  It might be sufficiently safe to
test BAR expansion on initialization and then allow user control, but
I'm concerned that resource availability could change while already in
use by the user.  Thanks,
As mentioned in other replies in this thread and what my first
thought about this, this will indeed will break on devices which
don't accurately report the maximum bar size that they actually need.
Even the spec itself says that it is vendor specific to determine the
optimal bar size.

We can also allow guest to resize the bar and if that fails,
expose the error via a virtual AER message on the root port
where the device is attached?

Sounds like it might work in theory, but I'm not an expert for KVM.

Regards,
Christian.


I personally don't know if this is possible/worth it.


Best regards,
	Maxim Levitsky

Alex


_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux