On Thu 30-11-23 20:31:44, Baoquan He wrote: [...] > > > which doesn't use the proper pinning API (which would migrate away from > > > the CMA) then what is the worst case? We will get crash kernel corrupted > > > potentially and fail to take a proper kernel crash, right? Is this > > > worrisome? Yes. Is it a real roadblock? I do not think so. The problem > > We may fail to take a proper kernel crash, why isn't it a roadblock? It would be if the threat was practical. So far I only see very theoretical what-if concerns. And I do not mean to downplay those at all. As already explained proper CMA users shouldn't ever leak out any writes across kernel reboot. > We > have stable way with a little more memory, why would we take risk to > take another way, just for saving memory? Usually only high end server > needs the big memory for crashkernel and the big end server usually have > huge system ram. The big memory will be a very small percentage relative > to huge system RAM. Jiri will likely talk more specific about that but our experience tells that proper crashkernel memory scaling has turned out a real maintainability problem because existing setups tend to break with major kernel version upgrades or non trivial changes. > > > seems theoretical to me and it is not CMA usage at fault here IMHO. It > > > is the said theoretical driver that needs fixing anyway. > > Now, what we want to make clear is if it's a theoretical possibility, or > very likely happen. We have met several on-flight DMA stomping into > kexec kernel's initrd in the past two years because device driver didn't > provide shutdown() methor properly. For kdump, once it happen, the pain > is we don't know how to debug. For kexec reboot, customer allows to > login their system to reproduce and figure out the stomping. For kdump, > the system corruption rarely happend, and the stomping could rarely > happen too. yes, this is understood. > The code change looks simple and the benefit is very attractive. I > surely like it if finally people confirm there's no risk. As I said, we > can't afford to take the risk if it possibly happen. But I don't object > if other people would rather take risk, we can let it land in kernel. I think it is fair to be cautious and I wouldn't impose the new method as a default. Only time can tell how safe this really is. It is hard to protect agains theoretical issues though. Bugs should be fixed. I believe this option would allow to configure kdump much easier and less fragile. > My personal opinion, thanks for sharing your thought. Thanks for sharing. -- Michal Hocko SUSE Labs _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec