On 8/8/24 17:54, Bjorn Helgaas wrote: > On Thu, Aug 08, 2024 at 04:28:50PM -0400, Stewart Hildebrand wrote: >> On 8/8/24 15:28, Bjorn Helgaas wrote: >>> On Wed, Aug 07, 2024 at 11:17:12AM -0400, Stewart Hildebrand wrote: >>>> Devices with alignment specified will lose their alignment in cases when >>>> the bridge resources have been released, e.g. due to insufficient bridge >>>> window size. Restore the alignment. >>> >>> I guess this fixes a problem when the user has specified >>> "pci=resource_alignment=..." and we've decided to release and >>> reallocate a bridge window? Just looking for a bit more concrete >>> description of what this problem would look like to a user. >> >> Yes. When alignment has been specified via pcibios_default_alignment() >> or by the user with "pci=resource_alignment=...", and the bridge window >> is being reallocated, the specified alignment is lost and the resource >> may not be sufficiently aligned after reallocation. >> >> I can expand the commit description. > > I think a hint about where the alignment gets lost would be helpful, > too. > > This seems like a problem users could be seeing today, even > independent of the device passthrough issue that I think is the main > thrust of this series. If there's a problem report or an easy way to > reproduce this problem, that would be nice, too. Oh, sorry, I just realized that it's only alignments with IORESOURCE_STARTALIGN that get lost during bridge window realloc. Specifically, r->start gets overwritten in __release_child_resources(). There are a few code paths, such as the one in __release_child_resources(), that assume IORESOURCE_SIZEALIGN. In case of IORESOURCE_SIZEALIGN, the alignment is restored by a simple calculation. However, with IORESOURCE_STARTALIGN, we can't use a simple calculation, instead we need to consult pci_reassigndev_resource_alignment() to restore the alignment. I'll update the commit message. An alternative approach might be to store the alignment in a new member in struct resource, thus saving us from having to call pci_reassigndev_resource_alignment() for each PCI device on the bridge undergoing reallocation. BTW, this patch and the two "[powerpc,x86]/pci: Preserve IORESOURCE_STARTALIGN alignment" patches could potentially be folded into a single patch as they're all dealing with fixing IORESOURCE_STARTALIGN. To repro the issue on x86, we will need to apply this series except the current patch [3/8]. I ran into the issue with the following device. Let's call it example 1. Scrubbed/partial output from lspci -vv: Region 0: Memory at d1924600 (32-bit, non-prefetchable) [size=256] Region 1: Memory at d1924400 (32-bit, non-prefetchable) [size=512] Region 2: Memory at d1924000 (32-bit, non-prefetchable) [size=1K] Region 3: Memory at d1920000 (32-bit, non-prefetchable) [size=16K] Region 4: Memory at d1900000 (32-bit, non-prefetchable) [size=128K] Region 5: Memory at d1800000 (32-bit, non-prefetchable) [size=1M] Capabilities: [b0] MSI-X: Enable- Count=2 Masked- Vector table: BAR=0 offset=00000080 PBA: BAR=0 offset=00000090 Capabilities: [200 v1] Single Root I/O Virtualization (SR-IOV) IOVCap: Migration-, Interrupt Message Number: 000 IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+ IOVSta: Migration- Initial VFs: 4, Total VFs: 4, Number of VFs: 0, Function Dependency Link: 00 VF offset: 6, stride: 1, Device ID: 0100 Supported Page Size: 00000553, System Page Size: 00000001 Region 0: Memory at 00000000d0800000 (64-bit, non-prefetchable) VF Migration: offset: 00000000, BIR: 0 Kernel driver in use: pci-endpoint-test Kernel modules: pci_endpoint_test BARs 0, 1, and 2 are small, and the host firmware placed them on the same page. The host firmware did not page align the BARs. There is also an SR-IOV BAR that the firmware couldn't fit in the bridge window. In example 1, I did not specify pci=realloc=on. I used a kernel with CONFIG_PCI_REALLOC_ENABLE_AUTO=y, and the SR-IOV BAR triggered the bridge window realloc. Alignment was requested for BARs 0, 1, and 2 of this device, using IORESOURCE_STARTALIGN, because of patch [8/8]. The alignment was lost during realloc. Such a device from example 1 may be hard to come by, so here's a way to reproduce with qemu. Example 2: 01:00.0 Unclassified device [00ff]: Red Hat, Inc. QEMU PCI Test Device Subsystem: Red Hat, Inc. QEMU Virtual Machine Flags: fast devsel Memory at fe800000 (32-bit, non-prefetchable) [size=4K] I/O ports at c000 [size=256] Memory at 7050000000 (64-bit, prefetchable) [size=32] 01:01.0 Unclassified device [00ff]: Red Hat, Inc. QEMU PCI Test Device Subsystem: Red Hat, Inc. QEMU Virtual Machine Flags: fast devsel Memory at fe801000 (32-bit, non-prefetchable) [size=4K] I/O ports at c100 [size=256] Memory at 7050000020 (64-bit, prefetchable) [size=32] 01:02.0 Unclassified device [00ff]: Red Hat, Inc. QEMU PCI Test Device Subsystem: Red Hat, Inc. QEMU Virtual Machine Flags: fast devsel Memory at fe802000 (32-bit, non-prefetchable) [size=4K] I/O ports at c200 [size=256] Memory at 7050000040 (64-bit, prefetchable) [size=32] 01:03.0 Unclassified device [00ff]: Red Hat, Inc. QEMU PCI Test Device Subsystem: Red Hat, Inc. QEMU Virtual Machine Flags: fast devsel Memory at fe803000 (32-bit, non-prefetchable) [size=4K] I/O ports at c300 [size=256] Memory at 7040000000 (64-bit, prefetchable) [size=256M] This example can reproduced with Qemu's pci-testdev and a SeaBIOS hack. In this case we will need to specify pci=realloc=on to trigger the realloc because there's no SR-IOV BAR to trigger it automatically. Add this to your usual qemu-system-x86_64 args: -append "console=ttyS0 ignore_loglevel pci=realloc=on" \ -device pcie-pci-bridge,id=pcie.1 \ -device pci-testdev,bus=pcie.1,membar=32 \ -device pci-testdev,bus=pcie.1,membar=32 \ -device pci-testdev,bus=pcie.1,membar=32 \ -device pci-testdev,bus=pcie.1,membar=268435456 Apply this SeaBIOS hack: diff --git a/src/fw/pciinit.c b/src/fw/pciinit.c index b3e359d7..769007a4 100644 --- a/src/fw/pciinit.c +++ b/src/fw/pciinit.c @@ -25,7 +25,7 @@ #include "util.h" // pci_setup #include "x86.h" // outb -#define PCI_DEVICE_MEM_MIN (1<<12) // 4k == page size +#define PCI_DEVICE_MEM_MIN (0) #define PCI_BRIDGE_MEM_MIN (1<<21) // 2M == hugepage size #define PCI_BRIDGE_IO_MIN 0x1000 // mandated by pci bridge spec @@ -1089,6 +1089,7 @@ pci_region_map_one_entry(struct pci_region_entry *entry, u64 addr) pci_config_writew(bdf, PCI_MEMORY_LIMIT, limit >> PCI_MEMORY_SHIFT); } if (entry->type == PCI_REGION_TYPE_PREFMEM) { + limit = addr + PCI_BRIDGE_MEM_MIN - 1; pci_config_writew(bdf, PCI_PREF_MEMORY_BASE, addr >> PCI_PREF_MEMORY_SHIFT); pci_config_writew(bdf, PCI_PREF_MEMORY_LIMIT, limit >> PCI_PREF_MEMORY_SHIFT); pci_config_writel(bdf, PCI_PREF_BASE_UPPER32, addr >> 32);