Hi Linus, On 05/07/21 at 12:25am, Linus Torvalds wrote: > On Thu, May 6, 2021 at 6:04 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > From: Saeed Mirzamohammadi <saeed.mirzamohammadi@xxxxxxxxxx> > > Subject: kernel/crash_core: add crashkernel=auto for vmcore creation > > > > This adds crashkernel=auto feature to configure reserved memory for vmcore > > creation. CONFIG_CRASH_AUTO_STR is defined to be set for different kernel > > distributions and different archs based on their needs. > > Ugh. I didn't realize how nasty this was until after I'd applied this patch. > > I'm going to drop this patch, because the Kconfig thing for it is an > unmitigated mess. I was confused by the question, and then the help > message was actively misleading. > > This is wrong for so many reasons: > > - this is a classic case of "you shouldn't ask a user this". > > The question makes no sense to any normal person, it certainly > didn't to me. Don't ask questions that don't have sane answers. > > - the config help text is actively misleading, and claims that the > option is about how much memory is reserved for a crash kernel > > Not so. It's the default string for when somebody uses "crashkernel=auto" Sorry for the confusion, we should have been more careful to reivew and add the commit log and kernel config description. > > - this shouldn't be a config option at all, it's clearly a distro > setting, and should be on the kernel command line with the other > distro settings. Don't know kernel config is disliked sometime, will remember it in the future and more cautiously to add. Crashkernel=auto exists in our distros for many years, and as David mentioned in other thread, we have been trying to adding rashkernel=auto support into upstream. We pursue crashkernel=auto being added to upstream because: 1) Empirical value is given to user by default; It was required by customer originally, now has been an important part of kdump feature and supported in several main ARCHes. With crashkernel=auto, people w/o much knowledge of kdump details can use kdump to debug. Distros can provide the suggested values with crashkernel=auto which are got by investigation, analysis and tested widely on test environment. 2) Cover corner case/special case; In some cases, kernel may need extra memory to handle, kdump kernel is not exceptional. E.g when sme/sev enabled, SWIOTLB will be enabled necessarily, even in kdump kernel. (Below sme/sev related commits for reference). Then extra 64M need be reserved for crashkernel. User doesn't need to know this, we already have done it for them. commit c7753208a94c ("x86, swiotlb: Add memory encryption support") commit aba2d9a6385a ("iommu/amd: Do not disable SWIOTLB if SME is active") commit d7b417fa08d1 ("x86/mm: Add DMA support for SEV memory encryption") We are eager to push crashkernel=auto to upstream becasue of our UPSTREAM FIRST rule. Since it has been in RHEL for many years, each time a new RHEL main release anchor a upstream kernel release and is prepared, these crashkernel=auto RHEL-only patches need be reviewed inside Redhat, then we will be questioned and challenged why they are not in upstream. As for how to implement crashkernel=auto, we have tried several ways. 1) Add into kernel command line The suggested value need be stored in user space if added into kernel command line, then added into kernel. This makes the suggested value separated from kernel itself. It's not what we expect to see. Because the suggested crashkernel value is strongly related to distros release. We could adjust the value between sub-releases of kernel because of of kernel change. Adding them into kernel command line make us lose the track of them in kernel. 2) Add a weak generic function and several arch dependent functions 3) Hardcode values in __parse_crashkernel() Method 2) is taken in our RHEL7, 3) is used in RHEL8, RHEL-only patches add them. If we try to push them into upstram, any later value adjustment need a upstream patch posting. Otherwise, RHEL-only patch need be introduced again, Redhat internal reviewer will challenge us again. (Put the value hard coding pieces at bottom for reference). 4) Add kernel config to add default value It's done in this patch. With the kernel config CRASH_AUTO_STR, Distros can add default value, and adjust it anytime in the future w/o bothering upstream. If crashkernel=auto is specified, only below 3 LOC added, to go to parse the CONFIG_CRASH_AUTO_STR directly. @@ -250,6 +251,12 @@ static int __init __parse_crashkernel(ch if (suffix) return parse_crashkernel_suffix(ck_cmdline, crash_size, suffix); +#ifdef CONFIG_CRASH_AUTO_STR + if (strncmp(ck_cmdline, "auto", 4) == 0) { + ck_cmdline = CONFIG_CRASH_AUTO_STR; + pr_info("Using crashkernel=auto, the size chosen is a best effort estimation.\n"); + } +#endif Before this, we don't know Saeed Mirzamohammadi, the patch author. He could experience the same torture. We were wild with joy when noticing his patch. We were planning to launch new round of post to add crashkernel=auto, kernel config is our final option too. We could be too happy to forget polishing the commit log. Not sure if I make myself clear. Basically, we expect crashkernel=auto to be added in upstream kernel. About how to implement it in kernel, we would like to hear upstream people's suggestion. Thanks Baoquan Hard code crashkernel=auto values in __parse_crashkernel() =========================================================== static int __init __parse_crashkernel(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base, const char *name, const char *suffix) { ...... if (strncmp(ck_cmdline, "auto", 4) == 0) { #if defined(CONFIG_X86_64) || defined(CONFIG_S390) ck_cmdline = "1G-4G:160M,4G-64G:192M,64G-1T:256M,1T-:512M"; #elif defined(CONFIG_ARM64) ck_cmdline = "2G-:448M"; #elif defined(CONFIG_PPC64) char *fadump_cmdline; fadump_cmdline = get_last_crashkernel(cmdline, "fadump=", NULL); fadump_cmdline = fadump_cmdline ? fadump_cmdline + strlen("fadump=") : NULL; if (!fadump_cmdline || (strncmp(fadump_cmdline, "off", 3) == 0)) ck_cmdline = "2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G"; else ck_cmdline = "4G-16G:768M,16G-64G:1G,64G-128G:2G,128G-1T:4G,1T-2T:6G,2T-4T:12G,4T-8T:20G,8T-16T:36G,16T-32T:64G,32T-64T:128G,64T-:180G"; #endif pr_info("Using crashkernel=auto, the size chosen is a best effort estimation.\n"); } ...... } ==================================================================