On 28/05/2019 17:39, Shawn Anastasio wrote: > > > On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote: >> >> >> On 28/05/2019 15:36, Oliver wrote: >>> On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio <shawn@xxxxxxxxxx> >>> wrote: >>>> >>>> Introduce a new pcibios function pcibios_ignore_alignment_request >>>> which allows the PCI core to defer to platform-specific code to >>>> determine whether or not to ignore alignment requests for PCI >>>> resources. >>>> >>>> The existing behavior is to simply ignore alignment requests when >>>> PCI_PROBE_ONLY is set. This is behavior is maintained by the >>>> default implementation of pcibios_ignore_alignment_request. >>>> >>>> Signed-off-by: Shawn Anastasio <shawn@xxxxxxxxxx> >>>> --- >>>> drivers/pci/pci.c | 9 +++++++-- >>>> include/linux/pci.h | 1 + >>>> 2 files changed, 8 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c >>>> index 8abc843b1615..8207a09085d1 100644 >>>> --- a/drivers/pci/pci.c >>>> +++ b/drivers/pci/pci.c >>>> @@ -5882,6 +5882,11 @@ resource_size_t __weak >>>> pcibios_default_alignment(void) >>>> return 0; >>>> } >>>> >>>> +int __weak pcibios_ignore_alignment_request(void) >>>> +{ >>>> + return pci_has_flag(PCI_PROBE_ONLY); >>>> +} >>>> + >>>> #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE >>>> static char >>>> resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; >>>> static DEFINE_SPINLOCK(resource_alignment_lock); >>>> @@ -5906,9 +5911,9 @@ static resource_size_t >>>> pci_specified_resource_alignment(struct pci_dev *dev, >>>> p = resource_alignment_param; >>>> if (!*p && !align) >>>> goto out; >>>> - if (pci_has_flag(PCI_PROBE_ONLY)) { >>>> + if (pcibios_ignore_alignment_request()) { >>>> align = 0; >>>> - pr_info_once("PCI: Ignoring requested alignments >>>> (PCI_PROBE_ONLY)\n"); >>>> + pr_info_once("PCI: Ignoring requested alignments\n"); >>>> goto out; >>>> } >>> >>> I think the logic here is questionable to begin with. If the user has >>> explicitly requested re-aligning a resource via the command line then >>> we should probably do it even if PCI_PROBE_ONLY is set. When it breaks >>> they get to keep the pieces. >>> >>> That said, the real issue here is that PCI_PROBE_ONLY probably >>> shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM) >>> hotplugged devices are configured by firmware before it's passed to >>> the guest and we need to keep the FW assignments otherwise things >>> break. QEMU however doesn't do any BAR assignments and relies on that >>> being handled by the guest. At boot time this is done by SLOF, but >>> Linux only keeps SLOF around until it's extracted the device-tree. >>> Once that's done SLOF gets blown away and the kernel needs to do it's >>> own BAR assignments. I'm guessing there's a hack in there to make it >>> work today, but it's a little surprising that it works at all... >> >> >> The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the >> guest which receives an event from qemu (RAS_EPOW from >> /proc/interrupts), fetches device tree chunks (and as I understand it - >> they come with BARs from phyp but without from qemu) and writes "1" to >> "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually: > > Interesting. Does this mean that the PHYP hotplug path doesn't > call pci_assign_resource? I'd expect dlpar_add_slot() to be called under phyp and eventually pci_device_add() which (I think) may or may not trigger later reassignment. > If so it means the patch may not > break that platform after all, though it still may not be > the correct way of doing things. We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems that (unless resource_alignment= is used) the pseries guest should just walk through all allocated resources and leave them unchanged. >> [c000000006e6f960] [c0000000005f62d4] pci_assign_resource+0x44/0x360 >> >> [c000000006e6fa10] [c0000000005f8b54] >> assign_requested_resources_sorted+0x84/0x110 >> [c000000006e6fa60] [c0000000005f9540] >> __assign_resources_sorted+0xd0/0x750 >> [c000000006e6fb40] [c0000000005fb2e0] >> __pci_bus_assign_resources+0x80/0x280 >> [c000000006e6fc00] [c0000000005fb95c] >> pci_assign_unassigned_bus_resources+0xbc/0x100 >> [c000000006e6fc60] [c0000000005e3d74] pci_rescan_bus+0x34/0x60 >> >> [c000000006e6fc90] [c0000000005f1ef4] rescan_store+0x84/0xc0 >> >> [c000000006e6fcd0] [c00000000068060c] bus_attr_store+0x3c/0x60 >> >> [c000000006e6fcf0] [c00000000037853c] sysfs_kf_write+0x5c/0x80 >> >> >> >> >> >>> >>> IIRC Sam Bobroff was looking at hotplug under pseries recently so he >>> might have something to add. He's sick at the moment, but I'll ask him >>> to take a look at this once he's back among the living >>> >>>> diff --git a/include/linux/pci.h b/include/linux/pci.h >>>> index 4a5a84d7bdd4..47471dcdbaf9 100644 >>>> --- a/include/linux/pci.h >>>> +++ b/include/linux/pci.h >>>> @@ -1990,6 +1990,7 @@ static inline void >>>> pcibios_penalize_isa_irq(int irq, int active) {} >>>> int pcibios_alloc_irq(struct pci_dev *dev); >>>> void pcibios_free_irq(struct pci_dev *dev); >>>> resource_size_t pcibios_default_alignment(void); >>>> +int pcibios_ignore_alignment_request(void); >>>> >>>> #ifdef CONFIG_HIBERNATE_CALLBACKS >>>> extern struct dev_pm_ops pcibios_pm_ops; >>>> -- >>>> 2.20.1 >>>> >> -- Alexey