On Thu, Nov 30, 2023 at 9:29 PM Michal Hocko <mhocko@xxxxxxxx> wrote: > > On Thu 30-11-23 20:04:59, Baoquan He wrote: > > On 11/30/23 at 11:16am, Michal Hocko wrote: > > > On Thu 30-11-23 11:00:48, Baoquan He wrote: > > > [...] > > > > Now, we are worried if there's risk if the CMA area is retaken into kdump > > > > kernel as system RAM. E.g is it possible that 1st kernel's ongoing RDMA > > > > or DMA will interfere with kdump kernel's normal memory accessing? > > > > Because kdump kernel usually only reset and initialize the needed > > > > device, e.g dump target. Those unneeded devices will be unshutdown and > > > > let go. > > > > > > I do not really want to discount your concerns but I am bit confused why > > > this matters so much. First of all, if there is a buggy RDMA driver > > > which doesn't use the proper pinning API (which would migrate away from > > > the CMA) then what is the worst case? We will get crash kernel corrupted > > > potentially and fail to take a proper kernel crash, right? Is this > > > worrisome? Yes. Is it a real roadblock? I do not think so. The problem > > > seems theoretical to me and it is not CMA usage at fault here IMHO. It > > > is the said theoretical driver that needs fixing anyway. > > > > > > Now, it is really fair to mention that CMA backed crash kernel memory > > > has some limitations > > > - CMA reservation can only be used by the userspace in the > > > primary kernel. If the size is overshot this might have > > > negative impact on kernel allocations > > > - userspace memory dumping in the crash kernel is fundamentally > > > incomplete. > > > > I am not sure if we are talking about the same thing. My concern is: > > ==================================================================== > > 1) system corrutption happened, crash dumping is prepared, cpu and > > interrupt controllers are shutdown; > > 2) all pci devices are kept alive; > > 3) kdump kernel boot up, initialization is only done on those devices > > which drivers are added into kdump kernel's initrd; > > 4) those on-flight DMA engine could be still working if their kernel > > module is not loaded; > > > > In this case, if the DMA's destination is located in crashkernel=,cma > > region, the DMA writting could continue even when kdump kernel has put > > important kernel data into the area. Is this possible or absolutely not > > possible with DMA, RDMA, or any other stuff which could keep accessing > > that area? > > I do nuderstand your concern. But as already stated if anybody uses > movable memory (CMA including) as a target of {R}DMA then that memory > should be properly pinned. That would mean that the memory will be > migrated to somewhere outside of movable (CMA) memory before the > transfer is configured. So modulo bugs this shouldn't really happen. > Are there {R}DMA drivers that do not pin memory correctly? Possibly. Is > that a road bloack to not using CMA to back crash kernel memory, I do > not think so. Those drivers should be fixed instead. > I think that is our concern. Is there any method to guarantee that will not happen instead of 'should be' ? Any static analysis during compiling time or dynamic checking method? If this can be resolved, I think this method is promising. Thanks, Pingfan > > The existing crashkernel= syntax can gurantee the reserved crashkernel > > area for kdump kernel is safe. > > I do not think this is true. If a DMA is misconfigured it can still > target crash kernel memory even if it is not mapped AFAICS. But those > are theoreticals. Or am I missing something? > -- > Michal Hocko > SUSE Labs > _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec