Re: [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge

suijingfeng <suijingfeng@xxxxxxxxxxx> · Tue, 18 May 2021 17:30:38 +0800

On 2021/5/18 上午11:09, Bjorn Helgaas wrote:

If VGA Enable is 0 and cannot be set to 1, the bridge should *never*
forward VGA accesses to its secondary bus.  The generic VGA driver
that uses the legacy [mem 0xa0000-0xbffff] range should not work with
the VGA device at 05:00.0, and that device cannot participate in the
VGA arbitration scheme, which relies on the VGA Enable bit.

If you have a driver for 05:00.0 that doesn't need the legacy memory
range, it's possible that it may work.  But VGA arbitration will be
broken, and if 05:00.0 needs to be initialized by an option ROM, that
probably won't work either.

We are not using a "generic VGA driver", in user space, we are using the 
modesetting driver come with X server, and it seems work normally. The 
real problems is VGA arbitration will not set 05:00.0 as the default VGA 
which means that when X server read 
/sys/devices/pci0000:00/0000:00:0c.0/0000:04:00.0/0000:05:00.0/boot_vga 
will get a "0". This break Xorg auto-detection. We want the boot_vga 
sysfs file be "1".

If the 04:00.0 bridge *always* forwards VGA accesses, even though its
VGA Enable bit is always zero, then the bridge is broken.  In that
case, the generic VGA driver should work with the 05:00.0 device, but
VGA arbitration will be limited.  I'm not sure, but the arbiter
*might* be able to use the VGA Enable bit in the 00:0c.0 bridge to
control VGA access to 05:00.0.  You wouldn't be able to have more than
one VGA device below 00:0c.0, and you may not be able have more than
one in the entire system.

We have only one VGA device(05:00.0) below 00:0c.0, but we do able to 
have more than one in the entire system. We could even mount a AMDGPU
on this server. But in reality, there is a render only GPU and a 
self-designed display controller integrated in LS7A1000 bridge. Both the 
render only GPU and the display controller is PCI device, they are 
located at PCI root bus directly without a PCI-to-PCI bridge in the 
middle. The display controller is blocked by the firmware if ASPEED BMC 
card is present, it can't be accessed under linux kernel. Let me show 
you a updated version of the PCI topology of our server(machine):

       /sys/devices/pci0000:00
       |-- 0000:00:06.0
       |   | -- class (0x040000)
       |   | -- vendor (0x0014)
       |   | -- device (0x7a15)
       |   | -- drm
       |   | -- ...
       |-- 0000:00:0c.0
       |   |-- class (0x060400)
       |   |-- vendor (0x0014)
       |   |-- device (0x7a09)
       |   |-- ...
       |   |-- 0000:04:00.0
       |   |   | -- class (0x060400)
       |   |   | -- device (0x1150)
       |   |   | -- vendor (0x1a03)
       |   |   | -- revision (0x04)
       |   |   | -- ...
       |   |   | -- 0000:05:00.0
       |   |   |    | -- class  (0x030000)
       |   |   |    | -- device (0x2000)
       |   |   |    | -- vendor (0x1a03)
       |   |   |    | -- boot_vga
       |   |   |    | -- i2c-6
       |   |   |    | -- drm
       |   |   |    | -- graphics
       |   |   |    | -- ...
       |   `-- uevent
       `-- ...

Even through the render only GPU(00:06.0) is not a VGA device, it still 
can disturb X server choose a primary device to use. But the root cause 
is the kernel side does not set 05:00.0 as default VGA. In this case X 
server will fallback to the first device found to use. and 00:06.0 is 
always found before 05:00.0. If kernel side set 05:00.0 as default VGA,
all other problems is secondary.