On 11/12/17 at 04:04pm, Baoquan He wrote: > On 11/07/17 at 04:34pm, Jiri Bohac wrote: > > On Tue, Nov 07, 2017 at 02:42:12PM +0100, Jiri Bohac wrote: > > > On Tue, Nov 07, 2017 at 07:39:56PM +0800, Baoquan He wrote: > > > > don't worry about the user space kexec utility either. > > > > > > What's the problem with the userspace kexec? The bug is in > > > reading /proc/vmcore by makedumpfile. kexec would only operate > > > within the preallocated crashkernel area, right? > > > > right, I see it (without -s the kexec userspace creates the ELF header > > later used the second kernel for /proc/vmcore). > > Yes, I meant this. In kernel, you can define global variable to store > the starting address and end of GART aperture. While, in user space > there's no way to know that. Now the non '-s' kexec are still being used > by most of people. > > I roughly went through agp3.0 doc and GART code, the root cause for this > issue should be: > > AMD system with GART need be enabled in BIOS in principle. Then firmware > will arrange a hole in system address space, defaultly it's 64MB for GART > aperture mapping, below and close to 4G usually. GART stands for Graphic > Address Remap Table, each of its entry can be used to refer to a address > region in the 64M of aperture for iommu usage. > > However, in your testing AMD system, you don't enable GART IOMMU support > in BIOS setting. So the current implementation in kernel is to find a > region which is occupied by system RAM and configre the starting addr > and size into GART cofig registers' AMD64_GARTAPERTURECTL and > AMD64_GARTAPERTUREBASE. And this happens in the first kernel. I believe > in kdump kernel, since only resereved crashkernel region is taken as > available system RAM, the rest of original RAM space is seen as hole. > So kdump kernel will still use the 1st kernel's aperture region for GART, > and it also has been set in GART register, kdump kernel think it as BIOS > has reserved hole for GART aperture. > > Now the problem is that those pages reserved for GART aperture have been > added into mm subsystem. GART is located on North Bridge. But when CPU > try to access these them, will check North Bridge chip firstly, then > hardware error occured that region has been set in GART registers which ^since (missed) > locates in NB. > > Solution: > 1) Remove the code which support GART IOMMU when it's not enabled in > BIOS. This has been done in the new generation of hardware IOMMU like > intel vt-d IOMMU and amd-Vi IOMMU. We should not make GART IOMMU be > exceptional. > > 2) Remove those pages from mm subsystem since they are not seen any more > though they have been added into mm subsystem, because CPU can't see > them. > > 3) Remove the apreture region from /proc/iomem so that pages in that > region can't be seen by kdump kernel. This is easier, but just a work > around. > > Hi Yinghai, Joerg, and Bjorn > > Found patches you contributed to GART IOMMU, do you have any suggestion > about this issue? Or any comment about these 3 options? > > I personally prefer the 1st one. > > Thanks > Baoquan > > > > > No idea how to fix that nicely... > > > > -- > > Jiri Bohac <jbohac at suse.cz> > > SUSE Labs, Prague, Czechia > > > > > > _______________________________________________ > > kexec mailing list > > kexec at lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/kexec > > _______________________________________________ > kexec mailing list > kexec at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec