On 28/05/2019 15:36, Oliver wrote: > On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio <shawn@xxxxxxxxxx> wrote: >> >> Introduce a new pcibios function pcibios_ignore_alignment_request >> which allows the PCI core to defer to platform-specific code to >> determine whether or not to ignore alignment requests for PCI resources. >> >> The existing behavior is to simply ignore alignment requests when >> PCI_PROBE_ONLY is set. This is behavior is maintained by the >> default implementation of pcibios_ignore_alignment_request. >> >> Signed-off-by: Shawn Anastasio <shawn@xxxxxxxxxx> >> --- >> drivers/pci/pci.c | 9 +++++++-- >> include/linux/pci.h | 1 + >> 2 files changed, 8 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c >> index 8abc843b1615..8207a09085d1 100644 >> --- a/drivers/pci/pci.c >> +++ b/drivers/pci/pci.c >> @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void) >> return 0; >> } >> >> +int __weak pcibios_ignore_alignment_request(void) >> +{ >> + return pci_has_flag(PCI_PROBE_ONLY); >> +} >> + >> #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE >> static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; >> static DEFINE_SPINLOCK(resource_alignment_lock); >> @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, >> p = resource_alignment_param; >> if (!*p && !align) >> goto out; >> - if (pci_has_flag(PCI_PROBE_ONLY)) { >> + if (pcibios_ignore_alignment_request()) { >> align = 0; >> - pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n"); >> + pr_info_once("PCI: Ignoring requested alignments\n"); >> goto out; >> } > > I think the logic here is questionable to begin with. If the user has > explicitly requested re-aligning a resource via the command line then > we should probably do it even if PCI_PROBE_ONLY is set. When it breaks > they get to keep the pieces. > > That said, the real issue here is that PCI_PROBE_ONLY probably > shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM) > hotplugged devices are configured by firmware before it's passed to > the guest and we need to keep the FW assignments otherwise things > break. QEMU however doesn't do any BAR assignments and relies on that > being handled by the guest. At boot time this is done by SLOF, but > Linux only keeps SLOF around until it's extracted the device-tree. > Once that's done SLOF gets blown away and the kernel needs to do it's > own BAR assignments. I'm guessing there's a hack in there to make it > work today, but it's a little surprising that it works at all... The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the guest which receives an event from qemu (RAS_EPOW from /proc/interrupts), fetches device tree chunks (and as I understand it - they come with BARs from phyp but without from qemu) and writes "1" to "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually: [c000000006e6f960] [c0000000005f62d4] pci_assign_resource+0x44/0x360 [c000000006e6fa10] [c0000000005f8b54] assign_requested_resources_sorted+0x84/0x110 [c000000006e6fa60] [c0000000005f9540] __assign_resources_sorted+0xd0/0x750 [c000000006e6fb40] [c0000000005fb2e0] __pci_bus_assign_resources+0x80/0x280 [c000000006e6fc00] [c0000000005fb95c] pci_assign_unassigned_bus_resources+0xbc/0x100 [c000000006e6fc60] [c0000000005e3d74] pci_rescan_bus+0x34/0x60 [c000000006e6fc90] [c0000000005f1ef4] rescan_store+0x84/0xc0 [c000000006e6fcd0] [c00000000068060c] bus_attr_store+0x3c/0x60 [c000000006e6fcf0] [c00000000037853c] sysfs_kf_write+0x5c/0x80 > > IIRC Sam Bobroff was looking at hotplug under pseries recently so he > might have something to add. He's sick at the moment, but I'll ask him > to take a look at this once he's back among the living > >> diff --git a/include/linux/pci.h b/include/linux/pci.h >> index 4a5a84d7bdd4..47471dcdbaf9 100644 >> --- a/include/linux/pci.h >> +++ b/include/linux/pci.h >> @@ -1990,6 +1990,7 @@ static inline void pcibios_penalize_isa_irq(int irq, int active) {} >> int pcibios_alloc_irq(struct pci_dev *dev); >> void pcibios_free_irq(struct pci_dev *dev); >> resource_size_t pcibios_default_alignment(void); >> +int pcibios_ignore_alignment_request(void); >> >> #ifdef CONFIG_HIBERNATE_CALLBACKS >> extern struct dev_pm_ops pcibios_pm_ops; >> -- >> 2.20.1 >> -- Alexey