On Tue, Sep 30, 2014 at 12:15 AM, Li, ZhenHua <zhen-hual@xxxxxx> wrote: > Add Joerg to CC list. For it is also related to iommu module. > > Joerg, > There was a try for this dmar fault, > https://lkml.org/lkml/2014/8/18/118 > > This patch is trying to fix the same thing. > > > Zhenhua > > On 09/30/2014 02:09 PM, Li, Zhen-Hua wrote: >> >> On a HP system with Intel Corporation 82599 ethernet adapter, when kernel >> crashed and the kdump kernel boots with intel_iommu=on, there may be some >> unexpected DMA requests on this adapter, which will cause DMA Remapping >> faults like: >> dmar: DRHD: handling fault status reg 102 >> dmar: DMAR:[DMA Read] Request device [41:00.0] fault addr fff81000 >> DMAR:[fault reason 01] Present bit in root entry is clear >> >> Analysis for this bug: >> >> The present bit is set in this function: >> >> static struct context_entry * device_to_context_entry( >> struct intel_iommu *iommu, u8 bus, u8 devfn) >> { >> ...... >> set_root_present(root); >> ...... >> } >> >> Calling tree: >> ixgbe_open >> ixgbe_setup_tx_resources >> intel_alloc_coherent >> __intel_map_single >> domain_context_mapping >> domain_context_mapping_one >> device_to_context_entry >> >> This means, the present bit in root entry will not be set until the device >> driver is loaded. >> >> But in the kdump kernel, some hardware device does not know the OS is the >> second kernel and the drivers should be loaded again, this causes there >> are >> some unexpected DMA requsts on this device when it has not been >> initialized, >> and then the DMA Remapping errors come. >> >> To fix this DMAR fault, we need to reset the bus that this device on. >> Reset >> the device itself does not work. This seems like something that could happen with *any* device, not just the 82599 NIC. Or is there something in the "kernel crash -> kexec -> kdump kernel" path that stops DMA for most devices, but not for the 82599? >> There also was a discussion: >> https://lkml.org/lkml/2013/5/14/9 >> >> Signed-off-by: Li, Zhen-Hua <zhen-hual@xxxxxx> >> --- >> drivers/pci/quirks.c | 11 +++++++++++ >> 1 file changed, 11 insertions(+) >> >> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c >> index 80c2d01..5198af3 100644 >> --- a/drivers/pci/quirks.c >> +++ b/drivers/pci/quirks.c >> @@ -25,6 +25,7 @@ >> #include <linux/sched.h> >> #include <linux/ktime.h> >> #include <asm/dma.h> /* isa_dma_bridge_buggy */ >> +#include <linux/crash_dump.h> >> #include "pci.h" >> >> /* >> @@ -3832,3 +3833,13 @@ void pci_dev_specific_enable_acs(struct pci_dev >> *dev) >> } >> } >> } >> + >> +#ifdef CONFIG_CRASH_DUMP >> +void quirk_reset_buggy_devices(struct pci_dev *dev) >> +{ >> + if (unlikely(is_kdump_kernel())) >> + pci_try_reset_bus(dev->bus); >> +} >> +DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_INTEL, 0x10f8, >> + PCI_CLASS_NETWORK_ETHERNET, 8, quirk_reset_buggy_devices); >> +#endif >> > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html