On Wed, 6 Dec 2023 16:19:51 +0100 Michal Hocko <mhocko@xxxxxxxx> wrote: > On Wed 06-12-23 14:49:51, Michal Hocko wrote: > > On Wed 06-12-23 12:08:05, Philipp Rudo wrote: > [...] > > > If I understand Documentation/core-api/pin_user_pages.rst correctly you > > > missed case 1 Direct IO. In that case "short term" DMA is allowed for > > > pages without FOLL_LONGTERM. Meaning that there is a way you can > > > corrupt the CMA and with that the crash kernel after the production > > > kernel has panicked. > > > > Could you expand on this? How exactly direct IO request survives across > > into the kdump kernel? I do understand the RMDA case because the IO is > > async and out of control of the receiving end. > > OK, I guess I get what you mean. You are worried that there is > DIO request > program DMA controller to read into CMA memory > <panic> > boot into crash kernel backed by CMA > DMA transfer is done. > > DIO doesn't migrate the pinned memory because it is considered a very > quick operation which doesn't block the movability for too long. That is > why I have considered that a non-problem. RDMA on the other might pin > memory for transfer for much longer but that case is handled by > migrating the memory away. Right that is the scenario we need to prevent. > Now I agree that there is a chance of the corruption from DIO. The > question I am not entirely clear about right now is how big of a real > problem that is. DMA transfers should be a very swift operation. Would > it help to wait for a grace period before jumping into the kdump kernel? Please see my other mail. > > Also if direct IO is a problem how come this is not a problem for kexec > > in general. The new kernel usually shares all the memory with the 1st > > kernel. > > This is also more clear now. Pure kexec is shutting down all the devices > which should terminate the in-flight DMA transfers. Right, it _should_ terminate all transfers. But here we are back at the shitty device drivers that don't have a working shutdown method. That's why we have already seen the problem you describe above with kexec. And please believe me that debugging such a scenario is an absolute pain. Especially when it's a proprietary, out-of-tree driver that caused the mess. Thanks Philipp _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec