Re: Annoying AMDGPU boot-time warning due to simplefb / amdgpu resource clash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 28/06/2022 10:43, Thomas Zimmermann wrote:
Hi

Am 27.06.22 um 19:25 schrieb Linus Torvalds:
On Mon, Jun 27, 2022 at 1:02 AM Javier Martinez Canillas
<javierm@xxxxxxxxxx> wrote:

The flag was dropped because it was causing drivers that requested their
memory resource with pci_request_region() to fail with -EBUSY (e.g: the
vmwgfx driver):

https://www.spinics.net/lists/dri-devel/msg329672.html

See, *that* link would have been useful in the commit.

Rather than the useless link it has.

Anyway, removing the busy bit just made things worse.

If simplefb is actually still using that frame buffer, it's a problem.
If it isn't, then maybe that resource should have been released?

It's supposed to be released once amdgpu asks for conflicting framebuffers to be removed calling drm_aperture_remove_conflicting_pci_framebuffers().

That most definitely doesn't happen. This is on a running system:

   [torvalds@ryzen linux]$ cat /proc/iomem | grep BOOTFB
         00000000-00000000 : BOOTFB

so I suspect that the BUSY bit was never the problem - even for
vmwgfx). The problem was that simplefb doesn't remove its resource.

Guys, the *reason* for resource management is to catch people that
trample over each other's resources.

You literally basically disabled the code that checked for it by
removing the BUSY flag, and just continued to have conflicting
resources.

That isn't a "fix", that is literally "we are ignoring and breaking
the whole reason that the resource tree exists, but we'll still use it
for no good reason".

The EFI/VESA framebuffer is represented by a platform device. The BUSY flag we removed is in the 'sysfb' code that creates this device. The BOOTFB resource you see in your /proc/iomem is the framebuffer memory. The code is in sysfb_create_simplefb() [1]

Later during boot a device driver, 'simplefb' or 'simpledrm', binds to the device and reserves the framebuffer memory for rendering into it. For example in simpledrm. [2] At that point a BUSY flag is set for that reservation.


Yeah, yeah, most modern drivers ignore the IO resource tree, because
they end up working on another resource level entirely: they work on
not the IO resources, but on the "driver level" instead, and just
attach to PCI devices.

So these days, few enough drivers even care about the IO resource
tree, and it's mostly used for (a) legacy devices (think ISA) and (b)
the actual bus resource handling (so the PCI code itself uses it to
sort out resource use and avoid conflicts, but PCI drivers themselves
generally then don't care, because the bus has "taken care of it".

So that's why the amdgpu driver itself doesn't care about resource
allocations, and we only get a warning for that memory type case, not
for any deeper resource case.

And apparently the vmwgfx driver still uses that legacy "let's claim
all PCI resources in the resource tree" instead of just claiming the
device itself. Which is why it hit this whole BOOTFB resource thing
even harder.

But the real bug is that BOOTFB seems to claim this resource even
after it is done with it and other drivers want to take over.

Once amdgpu wants to take over, it has to remove the the platform device that represents the EFI framebuffer. It does so by calling the drm_aperture_ function, which in turn calls platform_device_unregister(). Afterwards, the platform device, driver and BOOTFB range are supposed to be entirely gone.

Unfortunately, this currently only works if a driver is bound to the platform device. Without simpledrm or simplefb, amdgpu won't find the platform device to remove.

I guess, what happens on your system is that sysfb create a device for the EFI framebuffer and then amdgpu comes and doesn't find it for removal. And later you see these warnings because BOOTFB is still around.

Javier already provided patches for this scenario, which are in the DRM tree. From drm-next, please cherry-pick

  0949ee75da6c ("firmware: sysfb: Make sysfb_create_simplefb() return a pdev pointer")

   bc824922b264 ("firmware: sysfb: Add sysfb_disable() helper function")

  873eb3b11860 ("fbdev: Disable sysfb device registration when removing conflicting FBs")

for testing. With these patches, amdgpu will find the sysfb device and unregister it.

The patches are queued up for the next merge window. If they resolve the issue, we'll already send with the next round of fixes.

I was able to reproduce the warning with kernel v5.19-rc4, a radeon GPU and the following config:

CONFIG_SYSFB=y
CONFIG_SYSFB_SIMPLEFB=y
# CONFIG_DRM_SIMPLEDRM is not set
# CONFIG_FB_SIMPLE is not set

After applying the 3 patches you mentioned, the issue is resolved. (at least on my setup).

Best regards,

--

Jocelyn


Best regards
Thomas

[1] https://elixir.bootlin.com/linux/latest/source/drivers/firmware/sysfb_simplefb.c#L115 [2] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/tiny/simpledrm.c#L544


Not the BUSY bit.

                      Linus





[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux