On Tue, May 20, 2014 at 03:35:15PM -0700, Francesco Ruggeri wrote: > Hi Guenter, > thank you for your reply. I will check out the changes that you pointed to. > The problem we are seeing is a race condition between for_each_pci_dev > (or similar) and device_unregisters. I am not sure if use of the new > lock should be extended to all code using for_each_pci_dev as well. > > pci_scan is a kernel thread that I used for testing purposes, to > mimick the dynamics that we saw in our crashes in > edac_pci_clear_parity_errors: > > for (;;) { > pci_dev = NULL; > while ((pci_dev = pci_get_device(PCI_ANY_ID, > PCI_ANY_ID, pci_dev)) != NULL) > ; > } > > It keeps traversing klist_devices in pci_bus_type using > bus_find_device, costantly resuming its search for the next element > starting from the one it got in the previous round. > There are several loops of this kind in linux. In case of this thread > no action is taken on the elements as they are "found". > > The race condition occurs when bus_find_device resumes its search from > a device that has been unregistered. Because device_unregister resets > klist_bus in the device, bus_find device cannot resume from where it > left off in the klist. > The sequence is device_unregister, device_del, bus_remove_device, > klist_del(&dev->p->knode_bus.). > Hmmm ... sounds more like a generic problem, not specifically related to pci. Essentially everything calling bus_find_device() with a starting device which has been removed (though only pci and scsi seem to be doing that in practice). Can you reproduce the problem with the latest kernel ? Also, can you send me the entire file with the kernel thread you mentioned above ? Maybe I can reproduce the problem here. Thanks, Guenter -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html