On 10/8/20 4:23 AM, Oliver O'Halloran wrote: > On Fri, Sep 25, 2020 at 7:23 PM Cédric Le Goater <clg@xxxxxxxx> wrote: >> >> To fix an issue with PHB hotplug on pSeries machine (HPT/XIVE), commit >> 3a3181e16fbd introduced a PPC specific pcibios_remove_bus() routine to >> clear all interrupt mappings when the bus is removed. This routine >> frees an array allocated in pcibios_scan_phb(). >> >> This broke PHB hotplug on PowerNV because, when a PHB is removed and >> re-scanned through sysfs, the PCI layer un-assigns and re-assigns >> resources to the PHB but does not destroy and recreate the PCI >> controller structure. Since pcibios_remove_bus() does not clear the >> 'irq_map' array pointer, a second removal of the PHB will try to free >> the array a second time and corrupt memory. > > "PHB hotplug" and "hot-plugging devices under a PHB" are different > things. What you're saying here doesn't make a whole lot of sense to > me unless you're conflating the two. The distinction is important > since on pseries we can use DLPAR to add and remove actual PHBs (i.e. > the pci_controller) at runtime, but there's no corresponding mechanism > on PowerNV. And it's even different on QEMU. >> Free the 'irq_map' array in pcibios_free_controller() to fix >> corruption and clear interrupt mapping after it has been >> disposed. This to avoid filling up the array with successive >> remove/rescan of a bus. > > Even with this patch I think we're still broken. With this patch > applied the init path is something like: > > per-phb init: > allocate phb->irq_map array > per-bus init: > nothing > per-device init: > pcibios_bus_add_device() > pci_read_irq_line() > pci_irq_map_register(pci_dev, virq) > *record the device's interrupt in phb->irq_map* > > And the teardown path: > > per-device teardown: > nothing > per-bus teardown: > pcibios_remove_bus() > pci_irq_map_dispose() > *walk phb->irq_map and dispose of each mapped interrupt* > per-phb teardown: > free(phb->irq_map) > > There's a lot of asymmetry here, which is a problem in itself, but the > real issue is that when removing *any* pci_bus under a PHB we dispose > of the LSI\ for *every* device under that PHB. Not good. > > Ideally we should be fixing this by having the per-device teardown > handle disposing the mapping. Unfortunately, there's no pcibios hook > that's called when removing a pci_dev. However, we can register a bus > notifier which will be called when the pci_dev is removed from its bus > and we already do that for the per-device EEH teardown and to handle > IOMMU TCE invalidation when the device is removed. I lack the knowledge here and I think some else should take over, as I am not doing a good job. Michael, can you drop the initial patch again :/ It is better not to merge anything. Thanks, C.