Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

Baoquan He <bhe@xxxxxxxxxx> · Fri, 8 Dec 2023 09:55:39 +0800

On 12/07/23 at 12:52pm, Michal Hocko wrote:
> On Thu 07-12-23 12:13:14, Philipp Rudo wrote:
> > On Thu, 7 Dec 2023 09:55:20 +0100
> > Michal Hocko <mhocko@xxxxxxxx> wrote:
> > 
> > > On Thu 07-12-23 12:23:13, Baoquan He wrote:
> > > [...]
> > > > We can't guarantee how swift the DMA transfer could be in the cma, case,
> > > > it will be a venture.  
> > > 
> > > We can't guarantee this of course but AFAIK the DMA shouldn't take
> > > minutes, right? While not perfect, waiting for some time before jumping
> > > into the crash kernel should be acceptable from user POV and it should
> > > work around most of those potential lingering programmed DMA transfers.
> > 
> > I don't think that simply waiting is acceptable. For one it doesn't
> > guarantee that there is no corruption (please also see below) but only
> > reduces its probability. Furthermore, how long would you wait?
> 
> I would like to talk to storage experts to have some ballpark idea about
> worst case scenario but waiting for 1 minutes shouldn't terribly
> influence downtime and remember this is an opt-in feature. If that
> doesn't fit your use case, do not use it.
> 
> > Thing is that users don't only want to reduce the memory usage but also
> > the downtime of kdump. In the end I'm afraid that "simply waiting" will
> > make things unnecessarily more complex without really solving any issue.
> 
> I am not sure I see the added complexity. Something as simple as
> __crash_kexec:
> 	if (crashk_cma_cnt) 
> 		mdelay(TIMEOUT)
> 
> should do the trick.

I would say please don't do this. kdump jumping is a very quick
behavirou after corruption, usually in several seconds. I can't see any
meaningful stuff with the delay of one minute or several minutes. Most
importantly, the 1st kernel is in corruption which is a very
unpredictable state.
... 
> > Finally, let me question whether the whole approach actually solves
> > anything. For me the difficulty in determining the correct crashkernel
> > memory is only a symptom. The real problem is that most developers
> > don't expect their code to run outside their typical environment.
> > Especially not in an memory constraint environment like kdump. But that
> > problem won't be solved by throwing more memory at it as this
> > additional memory will eventually run out as well. In the end we are
> > back at the point where we are today but with more memory.
> 
> I disagree with you here. While the kernel is really willing to cache
> objects into memory I do not think that any particular subsystem is
> super eager to waste memory.
> 
> The thing we should keep in mind is that the memory sitting aside is not
> used in majority of time. Crashes (luckily/hopefully) do not happen very
> often. And I can really see why people are reluctant to waste it. Every
> MB of memory has an operational price tag on it. And let's just be
> really honest, a simple reboot without a crash dump is very likely
> a cheaper option than wasting a productive memory as long as the issue
> happens very seldom.

All the time, I have never heard people don't want to "waste" the
memory. E.g, for more than 90% of system on x86, 256M is enough. The
rare exceptions will be noted once recognized and documented in product
release.

And ,cma is not silver bullet, see this oom issue caused by i40e and its
fix , your crashkernel=1G,cma won't help either.

[v1,0/3] Reducing memory usage of i40e for kdump
https://patchwork.ozlabs.org/project/intel-wired-lan/cover/20210304025543.334912-1-coxu@xxxxxxxxxx/

======Abstrcted from above cover letter==========================
After reducing the allocation of tx/rx/arg/asq ring buffers to the
minimum, the memory consumption is significantly reduced,
    - x86_64: 85.1MB to 1.2MB 
    - POWER9: 15368.5MB to 20.8MB
==================================================================

And say more about it. This is not the first time of attempt to make use
of ,cma area for crashkernel=. In redhat, at least 5 people have tried
to add this, finally we gave up after long discussion and investigation.
This year, one kernel developer in our team raised this again with a
very long mail after his own analysis, we told him the discussion and
trying we have done in the past.

_______________________________________________
kexec mailing list
kexec@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/kexec