Re: [PATCH v3 3/8] PCI: Restore resource alignment

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/8/24 17:54, Bjorn Helgaas wrote:
> On Thu, Aug 08, 2024 at 04:28:50PM -0400, Stewart Hildebrand wrote:
>> On 8/8/24 15:28, Bjorn Helgaas wrote:
>>> On Wed, Aug 07, 2024 at 11:17:12AM -0400, Stewart Hildebrand wrote:
>>>> Devices with alignment specified will lose their alignment in cases when
>>>> the bridge resources have been released, e.g. due to insufficient bridge
>>>> window size. Restore the alignment.
>>>
>>> I guess this fixes a problem when the user has specified
>>> "pci=resource_alignment=..." and we've decided to release and
>>> reallocate a bridge window?  Just looking for a bit more concrete
>>> description of what this problem would look like to a user.
>>
>> Yes. When alignment has been specified via pcibios_default_alignment()
>> or by the user with "pci=resource_alignment=...", and the bridge window
>> is being reallocated, the specified alignment is lost and the resource
>> may not be sufficiently aligned after reallocation.
>>
>> I can expand the commit description.
> 
> I think a hint about where the alignment gets lost would be helpful,
> too.
> 
> This seems like a problem users could be seeing today, even
> independent of the device passthrough issue that I think is the main
> thrust of this series.  If there's a problem report or an easy way to
> reproduce this problem, that would be nice, too.

Oh, sorry, I just realized that it's only alignments with
IORESOURCE_STARTALIGN that get lost during bridge window realloc.
Specifically, r->start gets overwritten in __release_child_resources().
There are a few code paths, such as the one in
__release_child_resources(), that assume IORESOURCE_SIZEALIGN. In case
of IORESOURCE_SIZEALIGN, the alignment is restored by a simple
calculation. However, with IORESOURCE_STARTALIGN, we can't use a simple
calculation, instead we need to consult
pci_reassigndev_resource_alignment() to restore the alignment. I'll
update the commit message.

An alternative approach might be to store the alignment in a new member
in struct resource, thus saving us from having to call
pci_reassigndev_resource_alignment() for each PCI device on the bridge
undergoing reallocation.

BTW, this patch and the two "[powerpc,x86]/pci: Preserve
IORESOURCE_STARTALIGN alignment" patches could potentially be folded
into a single patch as they're all dealing with fixing
IORESOURCE_STARTALIGN.

To repro the issue on x86, we will need to apply this series except the
current patch [3/8].

I ran into the issue with the following device. Let's call it example 1.
Scrubbed/partial output from lspci -vv:

	Region 0: Memory at d1924600 (32-bit, non-prefetchable) [size=256]
	Region 1: Memory at d1924400 (32-bit, non-prefetchable) [size=512]
	Region 2: Memory at d1924000 (32-bit, non-prefetchable) [size=1K]
	Region 3: Memory at d1920000 (32-bit, non-prefetchable) [size=16K]
	Region 4: Memory at d1900000 (32-bit, non-prefetchable) [size=128K]
	Region 5: Memory at d1800000 (32-bit, non-prefetchable) [size=1M]
	Capabilities: [b0] MSI-X: Enable- Count=2 Masked-
		Vector table: BAR=0 offset=00000080
		PBA: BAR=0 offset=00000090
	Capabilities: [200 v1] Single Root I/O Virtualization (SR-IOV)
		IOVCap:	Migration-, Interrupt Message Number: 000
		IOVCtl:	Enable- Migration- Interrupt- MSE- ARIHierarchy+
		IOVSta:	Migration-
		Initial VFs: 4, Total VFs: 4, Number of VFs: 0, Function Dependency Link: 00
		VF offset: 6, stride: 1, Device ID: 0100
		Supported Page Size: 00000553, System Page Size: 00000001
		Region 0: Memory at 00000000d0800000 (64-bit, non-prefetchable)
		VF Migration: offset: 00000000, BIR: 0
	Kernel driver in use: pci-endpoint-test
	Kernel modules: pci_endpoint_test

BARs 0, 1, and 2 are small, and the host firmware placed them on the
same page. The host firmware did not page align the BARs. There is also
an SR-IOV BAR that the firmware couldn't fit in the bridge window.

In example 1, I did not specify pci=realloc=on. I used a kernel with
CONFIG_PCI_REALLOC_ENABLE_AUTO=y, and the SR-IOV BAR triggered the
bridge window realloc. Alignment was requested for BARs 0, 1, and 2 of
this device, using IORESOURCE_STARTALIGN, because of patch [8/8]. The
alignment was lost during realloc.

Such a device from example 1 may be hard to come by, so here's a way to
reproduce with qemu. Example 2:

01:00.0 Unclassified device [00ff]: Red Hat, Inc. QEMU PCI Test Device
        Subsystem: Red Hat, Inc. QEMU Virtual Machine
        Flags: fast devsel
        Memory at fe800000 (32-bit, non-prefetchable) [size=4K]
        I/O ports at c000 [size=256]
        Memory at 7050000000 (64-bit, prefetchable) [size=32]

01:01.0 Unclassified device [00ff]: Red Hat, Inc. QEMU PCI Test Device
        Subsystem: Red Hat, Inc. QEMU Virtual Machine
        Flags: fast devsel
        Memory at fe801000 (32-bit, non-prefetchable) [size=4K]
        I/O ports at c100 [size=256]
        Memory at 7050000020 (64-bit, prefetchable) [size=32]

01:02.0 Unclassified device [00ff]: Red Hat, Inc. QEMU PCI Test Device
        Subsystem: Red Hat, Inc. QEMU Virtual Machine
        Flags: fast devsel
        Memory at fe802000 (32-bit, non-prefetchable) [size=4K]
        I/O ports at c200 [size=256]
        Memory at 7050000040 (64-bit, prefetchable) [size=32]

01:03.0 Unclassified device [00ff]: Red Hat, Inc. QEMU PCI Test Device
        Subsystem: Red Hat, Inc. QEMU Virtual Machine
        Flags: fast devsel
        Memory at fe803000 (32-bit, non-prefetchable) [size=4K]
        I/O ports at c300 [size=256]
        Memory at 7040000000 (64-bit, prefetchable) [size=256M]

This example can reproduced with Qemu's pci-testdev and a SeaBIOS hack.
In this case we will need to specify pci=realloc=on to trigger the
realloc because there's no SR-IOV BAR to trigger it automatically. Add
this to your usual qemu-system-x86_64 args:

    -append "console=ttyS0 ignore_loglevel pci=realloc=on" \
    -device pcie-pci-bridge,id=pcie.1 \
    -device pci-testdev,bus=pcie.1,membar=32 \
    -device pci-testdev,bus=pcie.1,membar=32 \
    -device pci-testdev,bus=pcie.1,membar=32 \
    -device pci-testdev,bus=pcie.1,membar=268435456

Apply this SeaBIOS hack:

diff --git a/src/fw/pciinit.c b/src/fw/pciinit.c
index b3e359d7..769007a4 100644
--- a/src/fw/pciinit.c
+++ b/src/fw/pciinit.c
@@ -25,7 +25,7 @@
 #include "util.h" // pci_setup
 #include "x86.h" // outb

-#define PCI_DEVICE_MEM_MIN    (1<<12)  // 4k == page size
+#define PCI_DEVICE_MEM_MIN    (0)
 #define PCI_BRIDGE_MEM_MIN    (1<<21)  // 2M == hugepage size
 #define PCI_BRIDGE_IO_MIN      0x1000  // mandated by pci bridge spec
@@ -1089,6 +1089,7 @@ pci_region_map_one_entry(struct pci_region_entry *entry, u64 addr)
         pci_config_writew(bdf, PCI_MEMORY_LIMIT, limit >> PCI_MEMORY_SHIFT);
     }
     if (entry->type == PCI_REGION_TYPE_PREFMEM) {
+        limit = addr + PCI_BRIDGE_MEM_MIN - 1;
         pci_config_writew(bdf, PCI_PREF_MEMORY_BASE, addr >> PCI_PREF_MEMORY_SHIFT);
         pci_config_writew(bdf, PCI_PREF_MEMORY_LIMIT, limit >> PCI_PREF_MEMORY_SHIFT);
         pci_config_writel(bdf, PCI_PREF_BASE_UPPER32, addr >> 32);





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux