Add Kairui to CC since he is taking care of the crashkernel=auto code in our Distros. On 05/08/21 at 11:13am, Baoquan He wrote: > Hi Linus, > > On 05/07/21 at 12:25am, Linus Torvalds wrote: > > On Thu, May 6, 2021 at 6:04 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > From: Saeed Mirzamohammadi <saeed.mirzamohammadi@xxxxxxxxxx> > > > Subject: kernel/crash_core: add crashkernel=auto for vmcore creation > > > > > > This adds crashkernel=auto feature to configure reserved memory for vmcore > > > creation. CONFIG_CRASH_AUTO_STR is defined to be set for different kernel > > > distributions and different archs based on their needs. > > > > > > Ugh. I didn't realize how nasty this was until after I'd applied this patch. > > > > I'm going to drop this patch, because the Kconfig thing for it is an > > unmitigated mess. I was confused by the question, and then the help > > message was actively misleading. > > > > This is wrong for so many reasons: > > > > - this is a classic case of "you shouldn't ask a user this". > > > > The question makes no sense to any normal person, it certainly > > didn't to me. Don't ask questions that don't have sane answers. > > > > - the config help text is actively misleading, and claims that the > > option is about how much memory is reserved for a crash kernel > > > > Not so. It's the default string for when somebody uses "crashkernel=auto" > > Sorry for the confusion, we should have been more careful to reivew and > add the commit log and kernel config description. > > > > - this shouldn't be a config option at all, it's clearly a distro > > setting, and should be on the kernel command line with the other > > distro settings. > > Don't know kernel config is disliked sometime, will remember it in the > future and more cautiously to add. > > Crashkernel=auto exists in our distros for many years, and as David > mentioned in other thread, we have been trying to adding rashkernel=auto > support into upstream. We pursue crashkernel=auto being added to upstream > because: > > 1) Empirical value is given to user by default; > > It was required by customer originally, now has been an important part > of kdump feature and supported in several main ARCHes. With crashkernel=auto, > people w/o much knowledge of kdump details can use kdump to debug. Distros > can provide the suggested values with crashkernel=auto which are got by > investigation, analysis and tested widely on test environment. > > 2) Cover corner case/special case; > > In some cases, kernel may need extra memory to handle, kdump kernel is > not exceptional. E.g when sme/sev enabled, SWIOTLB will be enabled > necessarily, even in kdump kernel. (Below sme/sev related commits for > reference). Then extra 64M need be reserved for crashkernel. User > doesn't need to know this, we already have done it for them. > > commit c7753208a94c ("x86, swiotlb: Add memory encryption support") > commit aba2d9a6385a ("iommu/amd: Do not disable SWIOTLB if SME is active") > commit d7b417fa08d1 ("x86/mm: Add DMA support for SEV memory encryption") > > We are eager to push crashkernel=auto to upstream becasue of our > UPSTREAM FIRST rule. Since it has been in RHEL for many years, each time > a new RHEL main release anchor a upstream kernel release and is prepared, > these crashkernel=auto RHEL-only patches need be reviewed inside Redhat, > then we will be questioned and challenged why they are not in upstream. > > As for how to implement crashkernel=auto, we have tried several ways. > > 1) Add into kernel command line > > The suggested value need be stored in user space if added into kernel > command line, then added into kernel. This makes the suggested value > separated from kernel itself. It's not what we expect to see. Because > the suggested crashkernel value is strongly related to distros release. > We could adjust the value between sub-releases of kernel because of > of kernel change. Adding them into kernel command line make us lose the > track of them in kernel. > > 2) Add a weak generic function and several arch dependent functions > 3) Hardcode values in __parse_crashkernel() > > Method 2) is taken in our RHEL7, 3) is used in RHEL8, RHEL-only patches > add them. If we try to push them into upstram, any later value > adjustment need a upstream patch posting. Otherwise, RHEL-only patch > need be introduced again, Redhat internal reviewer will challenge us > again. (Put the value hard coding pieces at bottom for reference). > > 4) Add kernel config to add default value > > It's done in this patch. With the kernel config CRASH_AUTO_STR, Distros can > add default value, and adjust it anytime in the future w/o bothering > upstream. If crashkernel=auto is specified, only below 3 LOC added, to > go to parse the CONFIG_CRASH_AUTO_STR directly. > > @@ -250,6 +251,12 @@ static int __init __parse_crashkernel(ch > if (suffix) > return parse_crashkernel_suffix(ck_cmdline, crash_size, > suffix); > +#ifdef CONFIG_CRASH_AUTO_STR > + if (strncmp(ck_cmdline, "auto", 4) == 0) { > + ck_cmdline = CONFIG_CRASH_AUTO_STR; > + pr_info("Using crashkernel=auto, the size chosen is a best effort estimation.\n"); > + } > +#endif > > > Before this, we don't know Saeed Mirzamohammadi, the patch author. He > could experience the same torture. We were wild with joy when noticing > his patch. We were planning to launch new round of post to add > crashkernel=auto, kernel config is our final option too. We could be too > happy to forget polishing the commit log. > > Not sure if I make myself clear. Basically, we expect crashkernel=auto > to be added in upstream kernel. About how to implement it in kernel, we would > like to hear upstream people's suggestion. > > Thanks > Baoquan > > > Hard code crashkernel=auto values in __parse_crashkernel() > =========================================================== > static int __init __parse_crashkernel(char *cmdline, > unsigned long long system_ram, > unsigned long long *crash_size, > unsigned long long *crash_base, > const char *name, > const char *suffix) > { > ...... > if (strncmp(ck_cmdline, "auto", 4) == 0) { > #if defined(CONFIG_X86_64) || defined(CONFIG_S390) > ck_cmdline = "1G-4G:160M,4G-64G:192M,64G-1T:256M,1T-:512M"; > #elif defined(CONFIG_ARM64) > ck_cmdline = "2G-:448M"; > #elif defined(CONFIG_PPC64) > char *fadump_cmdline; > > fadump_cmdline = get_last_crashkernel(cmdline, "fadump=", NULL); > fadump_cmdline = fadump_cmdline ? > fadump_cmdline + strlen("fadump=") : NULL; > if (!fadump_cmdline || (strncmp(fadump_cmdline, "off", 3) == 0)) > ck_cmdline = "2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G"; > else > ck_cmdline = "4G-16G:768M,16G-64G:1G,64G-128G:2G,128G-1T:4G,1T-2T:6G,2T-4T:12G,4T-8T:20G,8T-16T:36G,16T-32T:64G,32T-64T:128G,64T-:180G"; > #endif > pr_info("Using crashkernel=auto, the size chosen is a best effort estimation.\n"); > } > > ...... > } > ================================================================== > > > >