On 4/4/23 1:18 PM, Daniel Vetter wrote:
Instead of calling aperture_remove_conflicting_devices() to remove the
conflicting devices, just call to aperture_detach_devices() to detach
the device that matches the same PCI BAR / aperture range. Since the
former is just a wrapper of the latter plus a sysfb_disable() call,
and now that's done in this function but only for the primary devices.
This fixes a regression introduced by ee7a69aa38d8 ("fbdev: Disable
sysfb device registration when removing conflicting FBs"), where we
remove the sysfb when loading a driver for an unrelated pci device,
resulting in the user loosing their efifb console or similar.
Note that in practice this only is a problem with the nvidia blob,
because that's the only gpu driver people might install which does not
come with an fbdev driver of it's own. For everyone else the real gpu
driver will restore a working console.
It might be worth noting that this also affects devices that have no
driver installed, or where the driver failed to initialize or was
configured not to set a mode. E.g. I reproduced this problem on a laptop
with i915.modeset=0 and an NVIDIA driver that calls
drm_fbdev_generic_setup. It would also reproduce on a system that sets
modeset=0 (or has a GPU that's too new for its corresponding kernel
driver) and that passes an NVIDIA GPU through to a VM using vfio-pci
since that also calls aperture_remove_conflicting_pci_devices.
I agree that in practice this will mostly affect people with our driver
until I get my changes to add drm_fbdev_generic_setup checked in. But
these other cases don't seem all that unlikely to me.
-- Aaron
Also note that in the referenced bug there's confusion that this same
bug also happens on amdgpu. But that was just another amdgpu specific
regression, which just happened to happen at roughly the same time and
with the same user-observable symptoms. That bug is fixed now, see
https://bugzilla.kernel.org/show_bug.cgi?id=216331#c15
Note that we should not have any such issues on non-pci multi-gpu
issues, because I could only find two such cases:
- SoC with some external panel over spi or similar. These panel
drivers do not use drm_aperture_remove_conflicting_framebuffers(),
so no problem.
- vga+mga, which is a direct console driver and entirely bypasses all
this.
For the above reasons the cc: stable is just notionally, this patch
will need a backport and that's up to nvidia if they care enough.
v2:
- Explain a bit better why other multi-gpu that aren't pci shouldn't
have any issues with making all this fully pci specific.
v3
- polish commit message (Javier)
Fixes: ee7a69aa38d8 ("fbdev: Disable sysfb device registration when removing conflicting FBs")
Tested-by: Aaron Plattner <aplattner@xxxxxxxxxx>
Reviewed-by: Javier Martinez Canillas <javierm@xxxxxxxxxx>
References: https://bugzilla.kernel.org/show_bug.cgi?id=216303#c28
Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxxx>
Cc: Aaron Plattner <aplattner@xxxxxxxxxx>
Cc: Javier Martinez Canillas <javierm@xxxxxxxxxx>
Cc: Thomas Zimmermann <tzimmermann@xxxxxxx>
Cc: Helge Deller <deller@xxxxxx>
Cc: Sam Ravnborg <sam@xxxxxxxxxxxx>
Cc: Alex Deucher <alexander.deucher@xxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx> # v5.19+ (if someone else does the backport)
---
drivers/video/aperture.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
index 8f1437339e49..2394c2d310f8 100644
--- a/drivers/video/aperture.c
+++ b/drivers/video/aperture.c
@@ -321,15 +321,16 @@ int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char *na
primary = pdev == vga_default_device();
+ if (primary)
+ sysfb_disable();
+
for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
continue;
base = pci_resource_start(pdev, bar);
size = pci_resource_len(pdev, bar);
- ret = aperture_remove_conflicting_devices(base, size, name);
- if (ret)
- return ret;
+ aperture_detach_devices(base, size);
}
if (primary) {