[cc += Stuart] On Wed, Jan 15, 2020 at 12:24:29PM +0100, Lukas Wunner wrote: > On Wed, Jan 15, 2020 at 11:26:26AM +0100, Oliver Neukum wrote: > > I got a bug report about some systems generating an NMI and > > subsequently crashing bisected down to 80696f991424d. > > Apparently these systems do not react well to __pciehp_enable_slot > > while no card is present. Restoring the check to __pciehp_enable_slot() > > removed in 80696f991424d makes the current kernels work. > > That's odd, these systems must be setting the Data Link Layer Link Active > bit in the Link Status Register even though no card is present. Recent PCIe versions allow turning off in-band presence detect, in which case the DLLLA bit can be set even though Presence Detect is not set. You may be dealing with one of those systems but without full dmesg and lspci output this is just an educated guess. A series was submitted by Dell last year to support disabling in-band presence detect, but it hasn't been merged yet by Bjorn: https://lore.kernel.org/linux-pci/20191025190047.38130-1-stuart.w.hayes@xxxxxxxxx/ You may want to try if that series helps. Thanks, Lukas > > What is to be done? Do you want a special case for the affected > > systems based on DMI, or should I revert 80696f991424d? > > It would be good if we could get a better idea what's going on before > deciding what action to take. What systems are we talking about exactly? > Can you provide dmesg and lspci -vvvv output including the NMI, e.g. by > attaching it to a new bugzilla? > > Thanks, > > Lukas