(2013/01/23 9:47), Thomas Renninger wrote: > On Monday, January 21, 2013 10:11:04 AM Takao Indoh wrote: >> (2013/01/08 4:09), Thomas Renninger wrote: > ... >>> I tried the provided patches first on 2.6.32, then I verfied with 3.8-rc2 >>> and in both cases the disk is not detected anymore in >>> reset_devices (kexec'ed/kdump) case (but things work fine without these >>> patches). >> >> So the problem that the disk is not detected was caused by exactmap >> problem you guys are discussing? Or still not detected even if exactmap >> problem is fixed? > This problem is related to the 5 PCI resetting patches. > Dumping worked with a 2.6.32 and a 3.8-rc2 kernel, adding the PCI resetting > patches broke both. I first tried 2.6.32 and verified with 3.8-rc2 to make sure > I didn't mess up the backport adjustings of the patches to 2.6.32. > > Unfortunately this Dell platform takes really long to boot. > I can give it the one or other test, but please do not bomb me with patches. > > For info: > About the interrupt remapping error interrupt storm in kdump case I tried to > reproduce on this machine, but never could: The guys who saw that also cannot > reproduce this anymore. > > Two ideas I had about this: > - As said already, (also) try to catch the error case and try to reset the > the device in AER/Specific iterrupt remapping error interrupt caught. I tried this idea but it did not work on megaraid_sas. I made a experimental patch so that devices are reset when DMAR error is detected on it. What happened is that: 1) megaraid_sas module is loaded. 2) DMAR error is detected during the driver initialization. 3) Reset device 4) kdump fails because the disk is not found. When I tested patches which reset all devices in early boot time, the disk was recognized correctly, so it seems that device reset during its driver loading does something wrong. I think we need reset device at least before its driver is loaded. Thanks, Takao Indoh > - Have a look at coreboot, these guys should know how to initialize the PCI > subsystem from scratch and might have some well tested PCI resetting > code in place already (no idea, just a thought). > > Thomas > >