(2013/04/03 17:24), David Woodhouse wrote: > On Wed, 2013-04-03 at 16:11 +0900, Takao Indoh wrote: >> (2013/04/02 23:05), Joerg Roedel wrote: >>> On Mon, Apr 01, 2013 at 02:45:18PM +0900, Takao Indoh wrote: >>>> <Current flow on kdump boot> >>>> enable_IR >>>> intel_enable_irq_remapping >>>> iommu_disable_irq_remapping <== IRES/QIES/TES disabled here >>>> dmar_disable_qi <== do nothing >>>> dmar_enable_qi <== QIES enabled >>>> intel_setup_irq_remapping <== IRES enabled >>> >>> But what we want to do here in the kdumo case is to disable translation >>> too, right? Because the former kernel might have translation and >>> irq-remapping enabled and the kdump kernel might be compiled without >>> support for dma-remapping. So if we don't disable translation here too >>> the kdump kernel is unable to do DMA. >> >> Yeah, you are right. I forgot such a case. > > If you disable translation and there's some device still doing DMA, it's > going to scribble over random areas of memory. You really want to have > translation enabled and all the page tables *cleared*, during kexec. I > think it's fair to insist that the secondary kernel should use the IOMMU > if the first one did. > >> To be honest, I also expected the side effect of this patch. As I wrote >> in the previous mail, I'm working on kdump problem with iommu, that is, >> ongoing DMA causes DMAR fault in 2nd kernel and sometimes kdump fails >> due to this fault. > > Here you've lost me. The DMAR fault is caught and reported, and how does > this lead to a kdump failure? Are you using dodgy hardware that just > keeps *trying* after an abort, and floods the system with a storm of > DMAR faults? We've occasionally spoken about working around such a > problem by setting a bit to make subsequent faults *silent*. Would that > work? There are several cases. - DMAR fault messages floods and second kernel does not boot. Recently I saw similar report. https://lkml.org/lkml/2013/3/8/120 - igb driver detectes error on linkup and kdump via network fails. - On a certain platform, though kdump itself works, PCIe error like Unexpected Completion is detected and it gets hardware degraded. Thanks, Takao Indoh > >> What we have to do is stopping DMA transaction >> before DMA-remapping is disabled in 2nd kernel. > > The IOMMU is there to stop DMA transactions. That is its *job*. :) >