On Mon, Feb 27, 2012 at 10:36 PM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote: > On Mon, Feb 27, 2012 at 7:09 PM, Yinghai Lu <yinghai@xxxxxxxxxx> wrote: >> Signed-off-by: Yinghai Lu <yinghai@xxxxxxxxxx> >> Cc: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> >> Cc: Paul Mackerras <paulus@xxxxxxxxx> >> Cc: linuxppc-dev@xxxxxxxxxxxxxxxx >> --- >> arch/powerpc/kernel/pci-common.c | 7 ++++++- >> 1 files changed, 6 insertions(+), 1 deletions(-) >> >> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c >> index 910b9de..ae5ae5f 100644 >> --- a/arch/powerpc/kernel/pci-common.c >> +++ b/arch/powerpc/kernel/pci-common.c >> @@ -1660,6 +1660,8 @@ void __devinit pcibios_scan_phb(struct pci_controller *hose) >> bus->secondary = hose->first_busno; >> hose->bus = bus; >> >> + pci_bus_insert_busn_res(bus, hose->first_busno, hose->last_busno); >> + >> /* Get probe mode and perform scan */ >> mode = PCI_PROBE_NORMAL; >> if (node && ppc_md.pci_probe_mode) >> @@ -1670,8 +1672,11 @@ void __devinit pcibios_scan_phb(struct pci_controller *hose) >> of_scan_bus(node, bus); >> } >> >> - if (mode == PCI_PROBE_NORMAL) >> + if (mode == PCI_PROBE_NORMAL) { >> + pci_bus_update_busn_res_end(bus, 255); >> hose->last_busno = bus->subordinate = pci_scan_child_bus(bus); >> + pci_bus_update_busn_res_end(bus, bus->subordinate); >> + } > > There's a lot of powerpc code that does this: > > bus_range = of_get_property(pcictrl, "bus-range", &len); > hose->first_busno = bus_range[0]; > hose->last_busno = bus_range[1]; > > That *looks* like it is discovering the bus number aperture. Is it? > If it is, why are we using the largest bus number found by > pci_scan_child_bus() rather than "last_busno"? Sorry, I missed the earlier hunk of the patch where you *do* use last_busno: >> + pci_bus_insert_busn_res(bus, hose->first_busno, hose->last_busno); I still think this part is wrong: + if (mode == PCI_PROBE_NORMAL) { + pci_bus_update_busn_res_end(bus, 255); hose->last_busno = bus->subordinate = pci_scan_child_bus(bus); + pci_bus_update_busn_res_end(bus, bus->subordinate); I think there are two problems: 1) We can enumerate devices under the wrong PHB. Assume this: PCI host bridge A to [bus 00] pci 0000:00:01.0: PCI bridge PCI host bridge B to [bus 01] pci 0000:01:01.0: PCI endpoint The P2P bridge at 00:01.0 has no devices below it, but of course we can't tell that until we look behind it. To do that, we'll have to assign a bus number, and since we forced the bus number aperture to [bus 00-ff] instead of the correct [bus 00], we'll probably allocate bus number 01 as the secondary bus. Then we'll generate a config cycle for 01:01.0, which discovers a device. But we can't tell that the cycle was actually claimed by host bridge B, not A. So now we wrongly think that 01:01.0 is under A, so we can't handle its resources correctly. I think we should have failed when allocating a secondary bus number for 00:01.0 and just skipped looking behind it. 2) We preclude hot-add in some cases. For example, if we scan this topology: PCI host bridge C to [bus 00-7f] pci 0000:00:01.0: PCI bridge to [bus 01] pci 0000:01:01.0: PCI endpoint we set the root bus's subordinate bus number to 01 (the highest bus number we discovered), so we now think host bridge C leads only to [bus 00-01]. Now let's remove 01:01.0 and plug in a card with a bridge on it, e.g., pci 0000:01:01.0: PCI bridge to ... pci 0000:xx:01.0: PCI endpoint We can't allocate a bus number for 01:01.0's secondary bus because we think we're out of space. But we're really not; the true bus number aperture for C is [bus 00-7f], not [bus 00-01]. We may need mechanism to say "don't trust this info from the firmware," but we should be able to figure out a way that doesn't penalize platforms that do everything correctly. The current patch breaks these scenarios even when the platform firmware is 100% correct. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html