On 09/12/19 at 12:23am, Kairui Song wrote: > On Wednesday, September 11, 2019, Ingo Molnar <mingo@xxxxxxxxxx> wrote: > > > > * Kairui Song <kasong@xxxxxxxxxx> wrote: > > > >> Since commit c7753208a94c ("x86, swiotlb: Add memory encryption > support"), > >> SWIOTLB will be enabled even if there is less than 4G of memory when SME > >> is active, to support DMA of devices that not support address with the > >> encrypt bit. > >> > >> And commit aba2d9a6385a ("iommu/amd: Do not disable SWIOTLB if SME is > >> active") make the kernel keep SWIOTLB enabled even if there is an IOMMU. > >> > >> Then commit d7b417fa08d1 ("x86/mm: Add DMA support for SEV memory > >> encryption") will always force SWIOTLB to be enabled when SEV is active > >> in all cases. > >> > >> Now, when either SME or SEV is active, SWIOTLB will be force enabled, > >> and this is also true for kdump kernel. As a result kdump kernel will > >> run out of already scarce pre-reserved memory easily. > >> > >> So when SME/SEV is active, reserve extra memory for SWIOTLB to ensure > >> kdump kernel have enough memory, except when "crashkernel=size[KMG],high" > >> is specified or any offset is used. As for the high reservation case, an > >> extra low memory region will always be reserved and that is enough for > >> SWIOTLB. Else if the offset format is used, user should be fully aware > >> of any possible kdump kernel memory requirement and have to organize the > >> memory usage carefully. > >> > >> Signed-off-by: Kairui Song <kasong@xxxxxxxxxx> > >> --- > >> arch/x86/kernel/setup.c | 20 +++++++++++++++++--- > >> 1 file changed, 17 insertions(+), 3 deletions(-) > >> > >> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > >> index 71f20bb18cb0..ee6a2f1e2226 100644 > >> --- a/arch/x86/kernel/setup.c > >> +++ b/arch/x86/kernel/setup.c > >> @@ -530,7 +530,7 @@ static int __init crashkernel_find_region(unsigned > long long *crash_base, > >> unsigned long long *crash_size, > >> bool high) > >> { > >> - unsigned long long base, size; > >> + unsigned long long base, size, mem_enc_req = 0; > >> > >> base = *crash_base; > >> size = *crash_size; > >> @@ -561,11 +561,25 @@ static int __init crashkernel_find_region(unsigned > long long *crash_base, > >> if (high) > >> goto high_reserve; > >> > >> + /* > >> + * When SME/SEV is active and not using high reserve, > >> + * it will always required an extra SWIOTLB region. > >> + */ > >> + if (mem_encrypt_active()) > >> + mem_enc_req = ALIGN(swiotlb_size_or_default(), SZ_1M); > >> + > >> base = memblock_find_in_range(CRASH_ALIGN, > >> - CRASH_ADDR_LOW_MAX, size, > >> + CRASH_ADDR_LOW_MAX, > >> + size + mem_enc_req, > >> CRASH_ALIGN); > > > > What sizes are we talking about here? > > > > - What is the possible size range of swiotlb_size_or_default() > > > > - What is the size of CRASH_ADDR_LOW_MAX (the old limit)? > > > > - Why do we replace one fixed limit with another fixed limit instead of > > accurately sizing the area, with each required feature adding its own > > requirement to the reservation size? > > > > I.e. please engineer this into a proper solution instead of just > > modifying it around the edges. > > > > For example have you considered adding some sort of > > kdump_memory_reserve(size) facility, which increases the reservation size > > as something like SWIOTLB gets activated? That would avoid the ugly > > mem_encrypt_active() flag, it would just automagically work. > > Hi, thanks for the suggestions, actually I did try to workout a better > resolution, at least for SWIOTLB and crashkernel memory, like make > crashkernel reserve more memory as SWIOTLB get activated to be more > flexible and generic. > > There are some problems: > > - Usually, for SWIOTLB, even if the first booting kernel have SWIOTLB > activated, second kernel will most likely not need it. Currently, only the > high reservation case will still need SWIOTLB in second kernel, and it's > already reserving extra low memory automatically. SME/SEV systems is the > only other special case that will force both kernel to use it. > > - There is a little complex procedure to judge whether SWIOTLB is required > (Depends on whether the system have >4G, if there is an iommu want to shut > down SWIOTLB, and some times iommu still expect SWIOTLB to exist, and > SWIOTLB could be activated first, then got closed later etc...). The crash > kernel reserve should happen very early to ensure the region is usable, but > kernel is not aware of if SWIOTLB is needed. Have to either move the > reservation to some later stage or always reserve extra memories early, > then release if not needed later. Neither sound a good solution, and after > all, as mentioned above, currently kernel need it doesn't mean kdump kernel > needs it. > > - Also tried to reuse and improve the currently crashk_low_res facility to > reserve the memory in a different block at first, make the code simpler. > Didn't work well due to some other issue with all current version of user > space kexec-tools, which never expect there will be an extra memory region > for non-high reservation case, and failed to load the kernel for kdump. > Have to always make the another block in a lower position or rename the > memory resource. I think this will either break user space tools heavily, > or make the implementation even more complex and confusing. Kairui, I remember you tried to reuse the swiotlb regions across kexec reboot, are you saying this as the 3rd problem above? It is not clear how you use crashk_low_res, can you elaborate it I remember you mentioned some problem with this approach, maybe we can re-explore it. Thanks Dave _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec