Re: [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 18, 2021 at 05:30:38PM +0800, suijingfeng wrote:
> On 2021/5/18 上午11:09, Bjorn Helgaas wrote:
> 
> > If VGA Enable is 0 and cannot be set to 1, the bridge should *never*
> > forward VGA accesses to its secondary bus.  The generic VGA driver
> > that uses the legacy [mem 0xa0000-0xbffff] range should not work with
> > the VGA device at 05:00.0, and that device cannot participate in the
> > VGA arbitration scheme, which relies on the VGA Enable bit.
> > 
> > If you have a driver for 05:00.0 that doesn't need the legacy memory
> > range, it's possible that it may work.  But VGA arbitration will be
> > broken, and if 05:00.0 needs to be initialized by an option ROM, that
> > probably won't work either.
> 
> We are not using a "generic VGA driver", in user space, we are using the
> modesetting driver come with X server, and it seems work normally. The real
> problems is VGA arbitration will not set 05:00.0 as the default VGA which
> means that when X server read
> /sys/devices/pci0000:00/0000:00:0c.0/0000:04:00.0/0000:05:00.0/boot_vga will
> get a "0". This break Xorg auto-detection. We want the boot_vga sysfs file
> be "1".

I don't think it's true; I think VGA arbitration *will* set 05:00.0 as
the default VGA, as long as 05:00.0 has been enumerated before
vga_arb_device_init().

As far as I can tell, the problem is not that the bridge is broken or
that vga_arb_device_init() is broken.  The problem is that on your
system, vga_arb_device_init() runs before 05:00.0 has been enumerated,
so it *can't* set 05:00.0 as the default VGA because it doesn't know
about 05:00.0 at all.

> > If the 04:00.0 bridge *always* forwards VGA accesses, even though its
> > VGA Enable bit is always zero, then the bridge is broken.  In that
> > case, the generic VGA driver should work with the 05:00.0 device, but
> > VGA arbitration will be limited.  I'm not sure, but the arbiter
> > *might* be able to use the VGA Enable bit in the 00:0c.0 bridge to
> > control VGA access to 05:00.0.  You wouldn't be able to have more than
> > one VGA device below 00:0c.0, and you may not be able have more than
> > one in the entire system.
> 
> We have only one VGA device(05:00.0) below 00:0c.0, but we do able to have
> more than one in the entire system. We could even mount a AMDGPU
> on this server. But in reality, there is a render only GPU and a
> self-designed display controller integrated in LS7A1000 bridge. Both the
> render only GPU and the display controller is PCI device, they are located
> at PCI root bus directly without a PCI-to-PCI bridge in the middle. The
> display controller is blocked by the firmware if ASPEED BMC card is present,
> it can't be accessed under linux kernel. Let me show you a updated version
> of the PCI topology of our server(machine):
> 
>        /sys/devices/pci0000:00
>        |-- 0000:00:06.0
>        |   | -- class (0x040000)
>        |   | -- vendor (0x0014)
>        |   | -- device (0x7a15)
>        |   | -- drm
>        |   | -- ...
>        |-- 0000:00:0c.0
>        |   |-- class (0x060400)
>        |   |-- vendor (0x0014)
>        |   |-- device (0x7a09)
>        |   |-- ...
>        |   |-- 0000:04:00.0
>        |   |   | -- class (0x060400)
>        |   |   | -- device (0x1150)
>        |   |   | -- vendor (0x1a03)
>        |   |   | -- revision (0x04)
>        |   |   | -- ...
>        |   |   | -- 0000:05:00.0
>        |   |   |    | -- class  (0x030000)
>        |   |   |    | -- device (0x2000)
>        |   |   |    | -- vendor (0x1a03)
>        |   |   |    | -- boot_vga
>        |   |   |    | -- i2c-6
>        |   |   |    | -- drm
>        |   |   |    | -- graphics
>        |   |   |    | -- ...
>        |   `-- uevent
>        `-- ...
> 
> Even through the render only GPU(00:06.0) is not a VGA device, it still can
> disturb X server choose a primary device to use. But the root cause is the
> kernel side does not set 05:00.0 as default VGA. In this case X server will
> fallback to the first device found to use. and 00:06.0 is always found
> before 05:00.0. If kernel side set 05:00.0 as default VGA,
> all other problems is secondary.

The fact that X selects 00:06.0 is a user-level thing and I don't know
what's involved.  Its class code (0x0400) looks like
PCI_CLASS_MULTIMEDIA_VIDEO, so vga_arb_device_init() should completely
ignore it.

vga_arb_device_init() should set 05:00.0 as the *kernel's* default
device as long as 05:00.0 has already been enumerated.

Even if vga_arb_device_init() runs before 05:00.0 has been enumerated,
it looks like vga_arbiter_add_pci_device() should notice when 05:00.0
is enumerated, and you should see the "VGA device added:" message for
it.

vga_arbiter_add_pci_device() looks like it *would* set 05:00.0 as the
default device in that case, except for the fact that 04:00.0 doesn't
support PCI_BRIDGE_CTL_VGA.  That's probably a bug in a37c0f48950b
("vgaarb: Select a default VGA device even if there's no legacy VGA").
I think the logic added by a37c0f48950b probably should be shared
between the vga_arb_device_init() initcall path and the
vga_arbiter_add_pci_device() device-add path.

Can you collect your dmesg output, so we can see the enumeration order
and what vgaarb does with it?

Bjorn



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux