On 04/07/15 at 05:55pm, Li, ZhenHua wrote: > On 04/07/2015 05:08 PM, Dave Young wrote: > >On 04/07/15 at 11:46am, Dave Young wrote: > >>On 04/05/15 at 09:54am, Baoquan He wrote: > >>>On 04/03/15 at 05:21pm, Dave Young wrote: > >>>>On 04/03/15 at 05:01pm, Li, ZhenHua wrote: > >>>>>Hi Dave, > >>>>> > >>>>>There may be some possibilities that the old iommu data is corrupted by > >>>>>some other modules. Currently we do not have a better solution for the > >>>>>dmar faults. > >>>>> > >>>>>But I think when this happens, we need to fix the module that corrupted > >>>>>the old iommu data. I once met a similar problem in normal kernel, the > >>>>>queue used by the qi_* functions was written again by another module. > >>>>>The fix was in that module, not in iommu module. > >>>> > >>>>It is too late, there will be no chance to save vmcore then. > >>>> > >>>>Also if it is possible to continue corrupt other area of oldmem because > >>>>of using old iommu tables then it will cause more problems. > >>>> > >>>>So I think the tables at least need some verifycation before being used. > >>>> > >>> > >>>Yes, it's a good thinking anout this and verification is also an > >>>interesting idea. kexec/kdump do a sha256 calculation on loaded kernel > >>>and then verify this again when panic happens in purgatory. This checks > >>>whether any code stomps into region reserved for kexec/kernel and corrupt > >>>the loaded kernel. > >>> > >>>If this is decided to do it should be an enhancement to current > >>>patchset but not a approach change. Since this patchset is going very > >>>close to point as maintainers expected maybe this can be merged firstly, > >>>then think about enhancement. After all without this patchset vt-d often > >>>raised error message, hung. > >> > >>It does not convince me, we should do it right at the beginning instead of > >>introduce something wrong. > >> > >>I wonder why the old dma can not be remap to a specific page in kdump kernel > >>so that it will not corrupt more memory. But I may missed something, I will > >>looking for old threads and catch up. > > > >I have read the old discussion, above way was dropped because it could corrupt > >filesystem. Apologize about late commenting. > > > >But current solution sounds bad to me because of using old memory which is not > >reliable. > > > >Thanks > >Dave > > > Seems we do not have a better solution for the dmar faults. But I believe > we can find out how to verify the iommu data which is located in old memory. That will be great, thanks. So there's two things: 1) make sure old pg tables are right, this is what we were talking about. 2) avoid writing old memory, I suppose only dma read could corrupt filesystem, right? So how about for any dma writes just create a scratch page in 2nd kernel memory. Only using old page table for dma read. Thanks Dave