On 10/14/19 at 07:05pm, Dave Young wrote: > On 10/12/19 at 05:24pm, Kairui Song wrote: > > On 9/27/19 1:42 PM, Dave Young wrote: > > > On 09/25/19 at 06:36pm, Kairui Song wrote: > > > > On Wed, Sep 11, 2019 at 1:56 PM Ingo Molnar <mingo@xxxxxxxxxx> wrote: > > > > > * Kairui Song <kasong@xxxxxxxxxx> wrote: > > > > > > > > > > > Since commit c7753208a94c ("x86, swiotlb: Add memory encryption support"), > > > > > > SWIOTLB will be enabled even if there is less than 4G of memory when SME > > > > > > is active, to support DMA of devices that not support address with the > > > > > > encrypt bit. > > > > > > > > > > > > And commit aba2d9a6385a ("iommu/amd: Do not disable SWIOTLB if SME is > > > > > > active") make the kernel keep SWIOTLB enabled even if there is an IOMMU. > > > > > > > > > > > > Then commit d7b417fa08d1 ("x86/mm: Add DMA support for SEV memory > > > > > > encryption") will always force SWIOTLB to be enabled when SEV is active > > > > > > in all cases. > > > > > > > > > > > > Now, when either SME or SEV is active, SWIOTLB will be force enabled, > > > > > > and this is also true for kdump kernel. As a result kdump kernel will > > > > > > run out of already scarce pre-reserved memory easily. > > > > > > > > > > > > So when SME/SEV is active, reserve extra memory for SWIOTLB to ensure > > > > > > kdump kernel have enough memory, except when "crashkernel=size[KMG],high" > > > > > > is specified or any offset is used. As for the high reservation case, an > > > > > > extra low memory region will always be reserved and that is enough for > > > > > > SWIOTLB. Else if the offset format is used, user should be fully aware > > > > > > of any possible kdump kernel memory requirement and have to organize the > > > > > > memory usage carefully. > > > > > > > > > > > > Signed-off-by: Kairui Song <kasong@xxxxxxxxxx> > > > > > > --- > > > > > > arch/x86/kernel/setup.c | 20 +++++++++++++++++--- > > > > > > 1 file changed, 17 insertions(+), 3 deletions(-) > > > > > > > > > > > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > > > > > > index 71f20bb18cb0..ee6a2f1e2226 100644 > > > > > > --- a/arch/x86/kernel/setup.c > > > > > > +++ b/arch/x86/kernel/setup.c > > > > > > @@ -530,7 +530,7 @@ static int __init crashkernel_find_region(unsigned long long *crash_base, > > > > > > unsigned long long *crash_size, > > > > > > bool high) > > > > > > { > > > > > > - unsigned long long base, size; > > > > > > + unsigned long long base, size, mem_enc_req = 0; > > > > > > > > > > > > base = *crash_base; > > > > > > size = *crash_size; > > > > > > @@ -561,11 +561,25 @@ static int __init crashkernel_find_region(unsigned long long *crash_base, > > > > > > if (high) > > > > > > goto high_reserve; > > > > > > > > > > > > + /* > > > > > > + * When SME/SEV is active and not using high reserve, > > > > > > + * it will always required an extra SWIOTLB region. > > > > > > + */ > > > > > > + if (mem_encrypt_active()) > > > > > > + mem_enc_req = ALIGN(swiotlb_size_or_default(), SZ_1M); > > > > > > + > > > > > > base = memblock_find_in_range(CRASH_ALIGN, > > > > > > - CRASH_ADDR_LOW_MAX, size, > > > > > > + CRASH_ADDR_LOW_MAX, > > > > > > + size + mem_enc_req, > > > > > > CRASH_ALIGN); > > > > > > > > > > > > > Hi Ingo, > > > > > > > > I re-read my previous reply, it's long and tedious, let me try to make > > > > a more effective reply: > > > > > > > > > What sizes are we talking about here? > > > > > > > > The size here is how much memory will be reserved for kdump kernel, to > > > > ensure kdump kernel and userspace can run without OOM. > > > > > > > > > > > > > > - What is the possible size range of swiotlb_size_or_default() > > > > > > > > swiotlb_size_or_default() returns the swiotlb size, it's specified by > > > > user using swiotlb=<size>, or default size (64MB) > > > > > > > > > > > > > > - What is the size of CRASH_ADDR_LOW_MAX (the old limit)? > > > > > > > > It's 4G. > > > > > > > > > > > > > > - Why do we replace one fixed limit with another fixed limit instead of > > > > > accurately sizing the area, with each required feature adding its own > > > > > requirement to the reservation size? > > > > > > > > It's quite hard to "accurately sizing the area". > > > > > > > > No way to tell the exact amount of memory kdump needs, we can only estimate. > > > > Kdump kernel use different cmdline, drivers and components will have > > > > special handling for kdump, and userspace is totally different. > > > > > > Agreed about your above, but specific this the problem in this patch > > > There should be other ways. > > > > > > First thought about doing generic handling in swiotlb part, and do > > > something like kdump_memory_reserve(size) Ingo suggested, but according > > > to you swiotlb init is late, so it can not increase the size, OTOH if > > > reserve another region for kdump in swiotlb will cause other issues. > > > > > > So let's think about other improvement, for example to see if you can > > > call kdump_memory_reserve(size) in AMD SME init path, for example in > > > mem_encrypt_init(), is it before crashkernel reservation? > > > > > > If doable it will be at least cleaner than the code in this patch. > > > > > > Thanks > > > Dave > > > > > > > How about something simple as following code? The logic and new function is as simple as > > possible, just always reserve extra low memory when SME/SEV is active, ignore the high/low > > reservation case. It will waste some memory with SME and high reservation though. > > > > Was hesitating a lot about this series, one thing I'm thinking is that what is the point > > of "crashkernel=" argument, if the crashkernel value could be adjusted according, the value > > specified will seems more meanless or confusing... > > > > And currently there isn't anything like crashkernel=auto or anything similiar to let kernel > > calculate the value automatically, maybe the admin should be aware of the value or be informed > > about the suitable crashkernel value after all? > > Hmm, it is reasonable that a user defined value should be just as is > without any change by kernel. So it is a good reason to introduce > a crashkernel=auto so that kernel can tune the crashkernel size > accordingly on top of some base value which can be configurable by > kernel configs (arch dependent). > Here is some old patches I posted for some default crashkernel values, maybe you can try to do something like that with a crashkernel=auto https://lkml.org/lkml/2018/5/20/262 Thanks Dave _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec