Jesse Barnes <jbarnes@xxxxxxxxxxxxxxxx> writes: > Any update here, Eric? Sounds like you're using hotplug in real environments > with complex topologies (based on your earlier messages), so we're interested > in what you're seeing here... Yes. Currently I have a test system that is a subset of what I'm worried about and will shortly have the real hardware, so my immediate goal is to get things working well enough so my internal users won't get blocked by bugs. Currently I only have the pcie hotplug and pcie hotplug surprise case. My basic topology is 16 hotplug slots into which I will be plugging in pci express switches with a couple of additional hotplug slots. As for the firmware, I will have it reserving bus numbers and mmio space on each of the first 16 slots and the rest is going to be up to the linux kernel. This is an embedded design so no ACPI is appears more pain than it is worth to implement. I am also looking at the case of pcie switches with two upstream ports, and switching which cpu they are connected to at runtime. So in some cases I will have devices whose presence is detected but will not get link for hours or days, as opposed to the 20ms time limit in the pci express specification. Call it a necessary extension. I need to revisit the pciehp driver but my first pass through it looked like every corner case appeared to get something wrong. So I have written myself a little 430 line replaces that handles the case that I currently care about. Part of what I was seeing before is that we don't clear pending events in the pciehp driver before we enable interrupts. So if booting the system has left some pending and you have CONFIG_DEBUG_SHIRQ enabled you get a nice oops because p_slot has not been initialized and so the interrupts can't be handled. My little driver is at least good enough to let me start looking at other things. I have just found yesterday that if you mmap a resource in sysfs hot-remove doesn't complete. Sysfs issues seem to be the bane of my existence and I am currently working on a patch for that. Currently the pcie port driver calls the remove methods of child drivers twice if it removed (say because you have hot unplugged a bridge). Which is one of the truly nasty bugs I saw when I was trying to bring up my test system, as things start access freed memory and all kinds of silly things happen. After I get the worst of the problems handled I intend to do a thorough review and fix everything that I can see. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html