On Fri 08-12-23 09:55:39, Baoquan He wrote: > On 12/07/23 at 12:52pm, Michal Hocko wrote: > > On Thu 07-12-23 12:13:14, Philipp Rudo wrote: [...] > > > Thing is that users don't only want to reduce the memory usage but also > > > the downtime of kdump. In the end I'm afraid that "simply waiting" will > > > make things unnecessarily more complex without really solving any issue. > > > > I am not sure I see the added complexity. Something as simple as > > __crash_kexec: > > if (crashk_cma_cnt) > > mdelay(TIMEOUT) > > > > should do the trick. > > I would say please don't do this. kdump jumping is a very quick > behavirou after corruption, usually in several seconds. I can't see any > meaningful stuff with the delay of one minute or several minutes. Well, I've been told that DMA should complete within seconds after controller is programmed (if that was much more then short term pinning is not really appropriate because that would block memory movability for way too long and therefore result in failures). This is something we can tune for. But if that sounds like a completely wrong approach then I think an alternative would be to live with potential inflight DMAs just avoid using that memory by the kdump kernel before the DMA controllers (PCI bus) is reinitialized by the kdump kernel. That should happen early in the boot process IIRC and the CMA backed memory could be added after that moment. We already do have means so defer memory initialization so an extension shouldn't be hard to do. It will be a slightly more involved patch touching core MM which we have tried to avoid so far. Does that sound like something acceptable? [...] > > The thing we should keep in mind is that the memory sitting aside is not > > used in majority of time. Crashes (luckily/hopefully) do not happen very > > often. And I can really see why people are reluctant to waste it. Every > > MB of memory has an operational price tag on it. And let's just be > > really honest, a simple reboot without a crash dump is very likely > > a cheaper option than wasting a productive memory as long as the issue > > happens very seldom. > > All the time, I have never heard people don't want to "waste" the > memory. E.g, for more than 90% of system on x86, 256M is enough. The > rare exceptions will be noted once recognized and documented in product > release. > > And ,cma is not silver bullet, see this oom issue caused by i40e and its > fix , your crashkernel=1G,cma won't help either. > > [v1,0/3] Reducing memory usage of i40e for kdump > https://patchwork.ozlabs.org/project/intel-wired-lan/cover/20210304025543.334912-1-coxu@xxxxxxxxxx/ > > ======Abstrcted from above cover letter========================== > After reducing the allocation of tx/rx/arg/asq ring buffers to the > minimum, the memory consumption is significantly reduced, > - x86_64: 85.1MB to 1.2MB > - POWER9: 15368.5MB to 20.8MB > ================================================================== Nice to see memory consumption reduction fixes. But, honestly this should happen regardless of kdump. CMA backed kdump is not to workaround excessive kernel memory consumers. It seems I am failing to get the message through :( but I do not know how else to express that the pressure on reducing the wasted memory is real. It is not important whether 256MB is enough for everybody. Even that would grow to non trivial cost in data centers with many machines. > And say more about it. This is not the first time of attempt to make use > of ,cma area for crashkernel=. In redhat, at least 5 people have tried > to add this, finally we gave up after long discussion and investigation. > This year, one kernel developer in our team raised this again with a > very long mail after his own analysis, we told him the discussion and > trying we have done in the past. This is really hard to comment on without any references to those discussions. From this particular email thread I have a perception that you guys focus much more on correctness provability than feasibility. If we applied the same approach universally then many other features couldn't have been merged. E.g. kexec for reasons you have mentioned in the email thread. Anyway, thanks for pointing to regular DMA via gup case which we were obviously not aware of. I personally have considered this to be a marginal problem comparing to RDMA which is unpredictable wrt timing. But we believe that this could be worked around. Now it would be really valuable if we knew somebody has _tried_ that and it turned out not working because of XYZ reasons. That would be a solid base to re-evaluate and think of different approaches. Look, this will be your call as maintainers in the end. If you are decided then fair enough. We might end up trying this feature downstream and maybe come back in the future with an experience which we currently do not have. But it seems we are not alone seeing the existing state is insufficient (http://lkml.kernel.org/r/20230719224821.GC3528218@xxxxxxxxxx). Thanks! -- Michal Hocko SUSE Labs _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec