On Wed, Jun 12, 2013 at 8:44 PM, Takao Indoh <indou.takao at jp.fujitsu.com> wrote: > (2013/06/12 13:45), Bjorn Helgaas wrote: >> [+cc Vivek, Haren; sorry I didn't think to add you earlier] >> >> On Tue, Jun 11, 2013 at 12:08 AM, Takao Indoh >> <indou.takao at jp.fujitsu.com> wrote: >>> (2013/06/11 11:20), Bjorn Helgaas wrote: >> >>>> I'm not sure you need to reset legacy devices (or non-PCI devices) >>>> yet, but the current hook isn't anchored anywhere -- it's just an >>>> fs_initcall() that doesn't give the reader any clue about the >>>> connection between the reset and the problem it's solving. >>>> >>>> If we do something like this patch, I think it needs to be done at the >>>> point where we enable or disable the IOMMU. That way, it's connected >>>> to the important event, and there's a clue about how to make >>>> corresponding fixes for other IOMMUs. >>> >>> Ok. pci_iommu_init() is appropriate place to add this hook? >> >> I looked at various IOMMU init places today, and it's far more >> complicated and varied than I had hoped. >> >> This reset scheme depends on enumerating PCI devices before we >> initialize the IOMMU used by those devices. x86 works that way today, >> but not all architectures do (see the sparc pci_fire_pbm_init(), for > > Sorry, could you tell me which part depends on architecture? Your patch works if PCIe devices are reset before the kdump kernel enables the IOMMU. On x86, this is possible because PCI enumeration happens before the IOMMU initialization. On sparc, the IOMMU is initialized before PCI devices are enumerated, so there would still be a window where ongoing DMA could cause an IOMMU error. Of course, it might be possible to reorganize the sparc code to to the IOMMU init *after* it enumerates PCI devices. But I think that change would be hard to justify. And I think even on x86, it would be better if we did the IOMMU init before PCI enumeration -- the PCI devices depend on the IOMMU, so logically the IOMMU should be initialized first so the PCI devices can be associated with it as they are enumerated. >> example). And I think conceptually, the IOMMU should be enumerated >> and initialized *before* the devices that use it. >> >> So I'm uncomfortable with that aspect of this scheme. >> >> It would be at least conceivable to reset the devices in the system >> kernel, before the kexec. I know we want to do as little as possible >> in the crashing kernel, but it's at least a possibility, and it might >> be cleaner. > > I bet this will be not accepted by kdump maintainer. Everything in panic > kernel is unreliable. kdump is inherently unreliable. The kdump kernel doesn't start from an arbitrary machine state. We don't expect it to tolerate all CPUs running, for example. Maybe it should be expected to tolerate PCI devices running, either. Bjorn