(2013/04/04 23:24), David Woodhouse wrote: > On Thu, 2013-04-04 at 14:48 +0900, Takao Indoh wrote: >> >> - DMAR fault messages floods and second kernel does not boot. Recently I >> saw similar report. https://lkml.org/lkml/2013/3/8/120 > > Right. So the fix for that is to make the subsequent errors silent, > until/unless we actually get a request to create a mapping for the given > device. > >> - igb driver detectes error on linkup and kdump via network fails. > > That's a driver bug, IIRC. It was failing to completely reset the > hardware. It's fixed now, isn't it? No, it can be reproduced with latest kernel(3.9.0-rc6). > >> - On a certain platform, though kdump itself works, PCIe error like >> Unexpected Completion is detected and it gets hardware degraded. > > More information required. When I tested intel_iommu on a certain machine, the following error message was logged in its firmware, and I/O board got abnormal status. 05:00.0 is igb, so I think this was caused by DMA error on igb. This occurs before igb driver loading, so this cannot be fixed in driver. PCI: Unexpected Completion Bus: 5 Device: 0x00 Function: 0x00 Anyway, I'm thinking we should introduce something framework to clean all devices to stop DMA at boot time rather than dealing with the problem in each driver. And one of the way I found is resetting devcies by PCIe layer. If DMAR is disabled in init_dmars(), we can have a chance to handle devices to stop DMA in PCI layer, like qci-quirk. This is one of the reason why I propose this patch. Thanks, Takao Indoh