On Tue, 2013-09-10 at 14:43 +0900, Takao Indoh wrote: > (2013/09/09 18:07), David Woodhouse wrote: > > If the driver is so broken that it cannot get the device working again > > after a fault, surely the driver needs to be fixed? > > Yes,this problem may be solved by fixing driver. Actually megaraid sas > driver is recently fixed for this problem. (See commit 6431f5d7) > > But I think root cause of this problem is initializing IOMMU while DMA > is still working, and I want to solve the root cause rather than > handling it in each driver, otherwise we have to fix driver each time we > find this kind of problem. But if the driver is broken and cannot actually recover from hardware issues, the driver needs to be fixed *anyway*. We shouldn't be papering over the problem. > > For the IOMMU code to reset individual devices, just because they still > > have an active DMA mapping even if they're not *doing* DMA, seems wrong. > > Right, current code is resetting devices which *may* be doing DMA. The > ideal way is finding devices which are actually doing DMA and reset only > them but I don't know how we can do this, though I think current code > is sufficient. No, that's not the ideal way either. Their DMA will be blocked, and they'll stop (or at least we'll stop getting an interrupt and reporting their DMA faults, if the hardware *is* so broken that it keeps trying over and over again). The new driver will come up and reset the device, and all will be well. Do not paper over driver bugs. You are just *encouraging* brokenness. We need to fix the 'fault storm' issue, by setting the FPD bit in the context-entry for offending devices when appropriate, and then clearing it again when appropriate too. But for the IOMMU code to go out and trigger a PCI reset of random devices and buses is ABSOLUTELY WRONG. Do Not Do This. -- dwmw2 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5745 bytes Desc: not available URL: <http://lists.infradead.org/pipermail/kexec/attachments/20130918/cfa8c937/attachment.bin>