On 31/05/2019 08:49, Shawn Anastasio wrote: > On 5/29/19 10:39 PM, Alexey Kardashevskiy wrote: >> >> >> On 28/05/2019 17:39, Shawn Anastasio wrote: >>> >>> >>> On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote: >>>> >>>> >>>> On 28/05/2019 15:36, Oliver wrote: >>>>> On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio <shawn@xxxxxxxxxx> >>>>> wrote: >>>>>> >>>>>> Introduce a new pcibios function pcibios_ignore_alignment_request >>>>>> which allows the PCI core to defer to platform-specific code to >>>>>> determine whether or not to ignore alignment requests for PCI >>>>>> resources. >>>>>> >>>>>> The existing behavior is to simply ignore alignment requests when >>>>>> PCI_PROBE_ONLY is set. This is behavior is maintained by the >>>>>> default implementation of pcibios_ignore_alignment_request. >>>>>> >>>>>> Signed-off-by: Shawn Anastasio <shawn@xxxxxxxxxx> >>>>>> --- >>>>>> drivers/pci/pci.c | 9 +++++++-- >>>>>> include/linux/pci.h | 1 + >>>>>> 2 files changed, 8 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c >>>>>> index 8abc843b1615..8207a09085d1 100644 >>>>>> --- a/drivers/pci/pci.c >>>>>> +++ b/drivers/pci/pci.c >>>>>> @@ -5882,6 +5882,11 @@ resource_size_t __weak >>>>>> pcibios_default_alignment(void) >>>>>> return 0; >>>>>> } >>>>>> >>>>>> +int __weak pcibios_ignore_alignment_request(void) >>>>>> +{ >>>>>> + return pci_has_flag(PCI_PROBE_ONLY); >>>>>> +} >>>>>> + >>>>>> #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE >>>>>> static char >>>>>> resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; >>>>>> static DEFINE_SPINLOCK(resource_alignment_lock); >>>>>> @@ -5906,9 +5911,9 @@ static resource_size_t >>>>>> pci_specified_resource_alignment(struct pci_dev *dev, >>>>>> p = resource_alignment_param; >>>>>> if (!*p && !align) >>>>>> goto out; >>>>>> - if (pci_has_flag(PCI_PROBE_ONLY)) { >>>>>> + if (pcibios_ignore_alignment_request()) { >>>>>> align = 0; >>>>>> - pr_info_once("PCI: Ignoring requested alignments >>>>>> (PCI_PROBE_ONLY)\n"); >>>>>> + pr_info_once("PCI: Ignoring requested alignments\n"); >>>>>> goto out; >>>>>> } >>>>> >>>>> I think the logic here is questionable to begin with. If the user has >>>>> explicitly requested re-aligning a resource via the command line then >>>>> we should probably do it even if PCI_PROBE_ONLY is set. When it breaks >>>>> they get to keep the pieces. >>>>> >>>>> That said, the real issue here is that PCI_PROBE_ONLY probably >>>>> shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM) >>>>> hotplugged devices are configured by firmware before it's passed to >>>>> the guest and we need to keep the FW assignments otherwise things >>>>> break. QEMU however doesn't do any BAR assignments and relies on that >>>>> being handled by the guest. At boot time this is done by SLOF, but >>>>> Linux only keeps SLOF around until it's extracted the device-tree. >>>>> Once that's done SLOF gets blown away and the kernel needs to do it's >>>>> own BAR assignments. I'm guessing there's a hack in there to make it >>>>> work today, but it's a little surprising that it works at all... >>>> >>>> >>>> The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the >>>> guest which receives an event from qemu (RAS_EPOW from >>>> /proc/interrupts), fetches device tree chunks (and as I understand it - >>>> they come with BARs from phyp but without from qemu) and writes "1" to >>>> "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually: >>> >>> Interesting. Does this mean that the PHYP hotplug path doesn't >>> call pci_assign_resource? >> >> >> I'd expect dlpar_add_slot() to be called under phyp and eventually >> pci_device_add() which (I think) may or may not trigger later >> reassignment. >> >> >>> If so it means the patch may not >>> break that platform after all, though it still may not be >>> the correct way of doing things. >> >> >> We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems >> that (unless resource_alignment= is used) the pseries guest should just >> walk through all allocated resources and leave them unchanged. > > If we add a pcibios_default_alignment() implementation like was > suggested earlier, then it will behave as if the user has > specified resource_alignment= by default and SLOF's assignments > won't be honored (I think). I removed pci_add_flags(PCI_PROBE_ONLY) from pSeries_setup_arch and tried booting with and without pci=resource_alignment= and I can see no difference - BARs are still aligned to 64K as programmed in SLOF; if I hack SLOF to align to 4K or 32K - BARs get packed and the guest leaves them unchanged. > I guess it boils down to one question - is it important that we > observe SLOF's initial BAR assignments? It isn't if it's SLOF but it is if it's phyp. It used to not allow/support BAR reassignment and even if it does not, I'd rather avoid touching them. > If not, the device tree > modification that Sam found would work fine here. Otherwise, > we need a way to honor the initial assignments from SLOF while > still allowing custom alignments for hotplugged devices, either > by deferring to the platform code like I do here, unsetting > PCI_PROBE_ONLY in certain cases or by using IORESOURCE_PCI_FIXED > like Bjorn suggested. > >> >> >>>> [c000000006e6f960] [c0000000005f62d4] pci_assign_resource+0x44/0x360 >>>> >>>> [c000000006e6fa10] [c0000000005f8b54] >>>> assign_requested_resources_sorted+0x84/0x110 >>>> [c000000006e6fa60] [c0000000005f9540] >>>> __assign_resources_sorted+0xd0/0x750 >>>> [c000000006e6fb40] [c0000000005fb2e0] >>>> __pci_bus_assign_resources+0x80/0x280 >>>> [c000000006e6fc00] [c0000000005fb95c] >>>> pci_assign_unassigned_bus_resources+0xbc/0x100 >>>> [c000000006e6fc60] [c0000000005e3d74] pci_rescan_bus+0x34/0x60 >>>> >>>> [c000000006e6fc90] [c0000000005f1ef4] rescan_store+0x84/0xc0 >>>> >>>> [c000000006e6fcd0] [c00000000068060c] bus_attr_store+0x3c/0x60 >>>> >>>> [c000000006e6fcf0] [c00000000037853c] sysfs_kf_write+0x5c/0x80 >>>> >>>> >>>> >>>> >>>> >>>>> >>>>> IIRC Sam Bobroff was looking at hotplug under pseries recently so he >>>>> might have something to add. He's sick at the moment, but I'll ask him >>>>> to take a look at this once he's back among the living >>>>> >>>>>> diff --git a/include/linux/pci.h b/include/linux/pci.h >>>>>> index 4a5a84d7bdd4..47471dcdbaf9 100644 >>>>>> --- a/include/linux/pci.h >>>>>> +++ b/include/linux/pci.h >>>>>> @@ -1990,6 +1990,7 @@ static inline void >>>>>> pcibios_penalize_isa_irq(int irq, int active) {} >>>>>> int pcibios_alloc_irq(struct pci_dev *dev); >>>>>> void pcibios_free_irq(struct pci_dev *dev); >>>>>> resource_size_t pcibios_default_alignment(void); >>>>>> +int pcibios_ignore_alignment_request(void); >>>>>> >>>>>> #ifdef CONFIG_HIBERNATE_CALLBACKS >>>>>> extern struct dev_pm_ops pcibios_pm_ops; >>>>>> -- >>>>>> 2.20.1 >>>>>> >>>> >> -- Alexey