Re: [patch 48/91] kernel/crash_core: add crashkernel=auto for vmcore creation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/11/21 at 07:31pm, Mike Rapoport wrote:
> Hi Baoquan,
> 
> On Tue, May 11, 2021 at 09:36:41PM +0800, Baoquan He wrote:
> > On 05/10/21 at 01:56pm, David Hildenbrand wrote:
> > > On 10.05.21 13:44, Dave Young wrote:
> > > > Hi David,
> > > 
> > > Hi Dave,
> > > 
> > > > On 05/10/21 at 01:01pm, David Hildenbrand wrote:
> > > > [snip]
> > > > > It also bugged me for quite a bit that we don't have a sane way to achieve
> > > > > what we're doing here upstream. It somewhat feels like "this doesn't belong
> > > > > in the kernel and is user policy" but then, the existing kernel support is
> > > > > suboptimal.
> > > > > 
> > > > > Maybe reserving some "maybe too big but okayish to boot the system in a sane
> > > > > environment -- e.g., X% of system RAM and at least Y" size first and
> > > > > shrinking it later as triggered by user space early (where we do seem to
> > > > > have a way to pre-calculate things now) might actually be a good direction
> > > > > to look into.
> > > > 
> > > > Hmm, that is also an option we considered before.  Even for your
> > > > suggestion we still need a kernel option to set the default ratio/value.
> > > > and the ratio/value should be another patch which expands crashkernel
> > > > syntax.
> > > 
> > > Right.
> > > 
> > > > 
> > > > Actually the kconfig help text in this patch is indeed misleading, it is
> > > > not introducing crashkernel=a:b... and no need to explain about the
> > > > crashkernel syntax, the config option is actually just some interface we
> > > > can add any valid crashkernel settings to be used by default. So current
> > > > patch help text describes the default value of crash auto str, instead
> > > > of describes what crash auto str is.
> > > 
> > > Right. And I would much rather prefer either
> > > 
> > > a) handling "auto" completely in the kernel, not just setting some
> > > questionable default at compile time
> > 
> > Thanks for the suggestions.
> > 
> > If the way adding default value into kernel config is disliked,
> > this a) option looks good. We can get value with x% of system RAM, but
> > clamp it with CRASH_KERNEL_MIN/MAX. The CRASH_KERNEL_MIN/MAX may need be
> > defined with a default value for different ARCHes. It's very close to
> > our current implementation, and handling 'auto' in kernel.
> > 
> > And kernel config provided so that people can tune the MIN/MAX value,
> > but no need to post patch to do the tuning each time if have to?
>  
> Maybe I'm missing something, but the whole point is to avoid kernel
> configuration option at all. If the crashkernel=auto works good for 99% of
> the cases, there is no need to provide build time configuration along with
> it. There are plenty of ways users can control crashkernel reservations
> with the existing 2-4 (depending on architecture) command line options.
> 
> Simply hard coding a reasonable defaults (e.g.
> "1G-64G:128M,64G-1T:256M,1T-:512M"), and using these defaults when
> crashkernel=auto is set would cover the same 99% of users you referred to.

Thanks for looking into this, Mike.

The crashkernel=auto works well for 99% of systems with a prerequisite
that values of 'auto' corresponds to a certain kernel, e.g distros kernel.
Say so because the kernel configs of a distros kernel decides the kernel
size, and also the initrd size. A generic default value for
crashkernel=auto doesn't make much sense when we make it into distros.
That's why we want to add the default value into kernel config originally.
Then asking for a minimal size with a kernel config tunable as the second
best when handle 'auto' in kernel as David's option a) suggested.

Here it's a little not clear to me about why kernel config has to be
avoided. We have this kind of tunable, e.g CONFIG_CMA_SIZE_SEL_MBYTES.

> 
> If we can resize the reservation later during boot this will also address
> David's concern about the wasted memory.

We can't resize the reservation, we can only shrink currently.

> 
> You mentioned that amount of memory that is required for crash kernel
> reservation depends on the devices present on the system. Is is possible to
> detect how much memory is required at late stages of boot?

It may be doable to detect at late stage of boot, need investigation, now we
are working to do after system bootup. The thing is the detection is
very coarse-grained. We count all loaded kernel modules in. But in kdump
kernel, only very necessary modules is added in our distros. e.g if we
dump through network, NIC modules are collected. otherwise we filter it out
to reduce memory usage in kdump kernel. For most of normal systems with
dozens of devices, memory required by device driver in kdump kernel is
limited. On VM guests, it's even much less since only very necessary
devices are added, e.g disk/NIC/serial.

So, I said 99% of systems can be covered by default value, it's based on
a certain kernel with fixed kernel configs, mainly related to distros.
Adding a permanent default value in upstream kernel doesn't make much
sense, if no tunable provided for distros to adjust.

> 
> > > b) passing it explicitly in via the cmdline
> > > 
> > > > 
> > > > And crashkernel=auto makes this more flexibly. We can tune the values
> > > > easily when upgrading.  But if we pass a fixed value in userspace we
> > > > can not know if the value is set by distribution automatically or by user
> > > > manually thus we can not blindly update it.
> > > 
> > > I think there are two different cases:
> > > 
> > > 
> > > 1. kernel space updates the value later during boot. "crashkernel=auto"
> > > really does the right thing, meaning
> > > 
> > > a) allocate something reasonable and safe during early boot
> > > b) update the allocation during late boot when we know what kind of system
> > > we're running on
> > > 
> > > Then, we indeed care about "crashkernel=auto" in the kernel and I think it
> > > would be a nice thing to have. The only question is on how to make that a
> > > little configurable, depending on different thingies we might want to run in
> > > the crashkernel (assuming someone doesn't want kdump).
> > > 
> > > 
> > > 2. user space updates the value later during boot
> > > 
> > > IMHO we don't really car who decided on the value as we do the update from
> > > user space. If an admin messes with crashkernel=, the admin can also mess
> > > with kdump not doing any overwrites (e.g., make that configurable, or detect
> > > the overwrite in kdump somehow).
> > > 
> > > -- 
> > > Thanks,
> > > 
> > > David / dhildenb
> > > 
> > 
> > 
> 
> -- 
> Sincerely yours,
> Mike.
> 




[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux