On 04/14/20 at 04:49pm, David Hildenbrand wrote: > >>>>> The root cause is kexec-ed kernel is targeted at hotpluggable memory > >>>>> region. Just avoiding the movable area can fix it. In kexec_file_load(), > >>>>> just checking or picking those unmovable region to put kernel/initrd in > >>>>> function locate_mem_hole_callback() can fix it. The page or pageblock's > >>>>> zone is movable or not, it's easy to know. This fix doesn't need to > >>>>> bother other component. > >>>> > >>>> I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL > >>>> does not imply that it cannot get offlined and removed e.g., this is > >>>> heavily used on ppc64, with 16MB sections. > >>> > >>> Really? I just know there are two kinds of mem hoplug in ppc, but don't > >>> know the details. So in this case, is there any flag or a way to know > >>> those memory block are hotpluggable? I am curious how those kernel data > >>> is avoided to be put in this area. Or ppc just freely uses it for kernel > >>> data or user space data, then try to migrate when hot remove? > >> > >> See > >> arch/powerpc/platforms/pseries/hotplug-memory.c:dlpar_memory_remove_by_count() > >> > >> Under DLAPR, it can remove memory in LMB granularity, which is usually > >> 16MB (== single section on ppc64). DLPAR will directly online all > >> hotplugged memory (LMBs) from the kernel using device_online(), which > >> will go to ZONE_NORMAL. > >> > >> When trying to remove memory, it simply scans for offlineable 16MB > >> memory blocks (==section == LMB), offlines and removes them. No need for > >> the movable zone and all the involved issues. > > > > Yes, this is a different one, thanks for pointing it out. It sounds like > > balloon driver in virt platform, doesn't it? > > With DLPAR there is a hypervisor involved (which manages the actual HW > DIMMs), so yes. > > > > > Avoiding to put kexec kernel into movable zone can't solve this DLPAR > > case as you said. > > > >> > >> Now, the interesting question is, can we have LMBs added during boot > >> (not via add_memory()), that will later be removed via remove_memory(). > >> IIRC, we had BUGs related to that, so I think yes. If a section contains > >> no unmovable allocations (after boot), it can get removed. > > > > I do want to ask this question. If we can add LMB into system RAM, then > > reload kexec can solve it. > > > > Another better way is adding a common function to filter out the > > movable zone when search position for kexec kernel, use a arch specific > > funciton to filter out DLPAR memory blocks for ppc only. Over there, > > we can simply use for_each_drmem_lmb() to do that. > > I was thinking about something similar. Maybe something like a notifier > that can be used to test if selected memory can be used for kexec Not sure if I get the notifier idea clearly. If you mean 1) Add a common function to pick memory in unmovable zone; 2) Let DLPAR, balloon register with notifier; 3) In the common function, ask notified part to check if the picked unmovable memory is available for locating kexec kernel; Sounds doable to me, and not complicated. > images. It would apply to > > - arm64 and filter out all hotadded memory (IIRC, only boot memory can > be used). Do you mean hot added memory after boot can't be recognized and added into system RAM on arm64? > - powerpc to filter out all LMBs that can be removed (assuming not all > memory corresponds to LMBs that can be removed, otherwise we're in > trouble ... :) ) > - virtio-mem to filter out all memory it added. > - hyper-v to filter out partially backed memory blocks (esp. the last > memory block it added and only partially backed it by memory). > > This would make it work for kexec_file_load(), however, I do wonder how > we would want to approach that from userspace kexec-tools when handling > it from kexec_load(). Let's make kexec_file_load work firstly. Since this work is only first step to make kexec-ed kernel not break memory hotplug. After kexec rebooting, the KASLR may locate kernel into hotpluggable area too.