On Mon, May 06, 2024 at 09:36:44PM +0200, Lukas Wunner wrote: > On Mon, May 06, 2024 at 10:37:01AM +0200, Nam Cao wrote: > > I just discovered a new crash scenario with shpchp: > > > > First, hot-add a bridge: > > (qemu) device_add pci-bridge,id=br2,bus=br1,chassis_nr=1,addr=1 > > > > But with the current patch, the hot-plug is still successful, just that the > > bridge is not properly configured: > > $ lspci -t > > -[0000:00]-+-00.0 > > +-01.0 > > +-02.0 > > +-03.0-[01-02]----00.0-[02]----01.0-- <-- new hot-added bridge > > +-1f.0 > > +-1f.2 > > \-1f.3 > > > > But! Now I leave the hot-added bridge as is, and hot-add an ethernet card: > > (qemu) device_add e1000,bus=br1,id=eth0,addr=2 > > > > this command crashes the kernel (full crash log below). > > > > The problem is because during hot-plugging of this ethernet card, > > pci_bus_add_devices() is invoked and the previously hot-plugged bridge hasn't > > been marked as "added" yet, so the driver attempts to add this bridge > > again, leading to the crash. > > > > Now for pciehp, this scenario cannot happen, because there is only a single > > slot, so we cannot hot-plug another device. But still, this makes me think > > perhaps we shouldn't leave the hot-plugged bridge in a improperly-configured > > state as this patch is doing. I cannot think of a scenario that would crash pciehp > > similarly to shpchp. But that's just me, maybe there is a rare scenario > > that can crash pciehp, but I just haven't seen yet. > > > > My point is: better safe than sorry. I propose going back to the original > > solution: calling pci_stop_and_remove_bus_device() and return a negative > > error, and get rid of this improperly configured bridge. It is useless > > anyways without a subordinate bus number. > > > > What do you think? > > When the kernel runs out of BAR address space for devices downstream of > a bridge, it doesn't de-enumerate the bridge. The devices are just > unusable. > > So my first intuition is that running out of bus numbers shouldn't result > in de-enumeration either. The BAR case is a bit different: it is a legal configuration for a bridge to have no address space downstream: we can configure the bridge with limit<base to indicate that there is no downstream addresses. There is nothing similar for secondary bus number: there is no legal bus number configuration for the bridge in this case. > Remind me, how exactly does the NULL pointer deref occur? I think it's > because no struct pci_bus was allocated for the subordinate bus of the > hot-plugged bridge, right? Because AFAICS that would happen in > > pci_hp_add_bridge() > pci_can_bridge_extend() > pci_add_new_bus() > pci_alloc_child_bus() > > but we never get that far because pci_hp_add_bridge() bails out with -1. > So the subordinate pointer remains a NULL pointer. This is correct. NULL deference happens due to subordinate pointer being NULL. > Perhaps we should allocate an empty struct pci_bus instead. This feels a bit hacky. What bus number could we set to this struct? And I doubt that the PCI subsystem is written with the expectation that struct pci_bus can be a dummy one, so I would guess that the probability of breaking thing is high with this approach. > Or check for a NULL subordinate pointer instead of crashing. I think this is a possible solution, but it is a bit complicated: all usage of subordinate pointers will need to be looked at. Further more, secondary bus number being zero is currently taken as unconfigured/invalid (for example in pci_scan_bridge_extend()), that needs to be changed as well. Doing this has a good chance of breaking things. I don't really see the upside of the above, compared to just deleting this bridge. It is not really counter-intuitive that the OS rejects a hot-plugged device that cannot be configured, right? Also this solution is wayyy simpler. Unless there is problem with this that I am not seeing? Best regards, Nam