On Mon, Jan 27, 2025 at 08:12:37AM -0800, Alexander Graf wrote: > I agree with the simplifications you're proposing; not using the purgatory > would be a great property to have. > > The reason why KHO doesn't do it yet is that I wanted to keep it simple from > the other end. The big problem with going A/B is that if done the simple > way, you only map B as MOVABLE while running in A. That means A could > accidentally allocate persistent memory from A's memory region. When A then > switches to B, B can no longer make all of A MOVABLE. But you have this basic problem no matter what? kexec requires a pretty big region of linear memory to boot a kernel into. Even with purgatory and copying you still have to have ensure a free linear space that has no KHO pages in it. This seems impossible to really guarentee unless you have a special KHO allocator that happens to guarentee available linear memory, or are doing tricks like we are discussing to use the normal allocator to keep allocations out of some linear memory. > So we need to ensure that *both* regions are MOVABLE, and the system is > always fully aware of both. I imagined the kernel would boot with only the A or B area of memory available during early boot, and then in later boot phases it would setup the additional memory that has a mix of KHO and free pages. This feels easier to do once the allocators are all fully started up - ie you can deal with KHO pages by just allocating them. [*] IOW each A/B area should be large enough to complete alot of boot and would end up naturally containing GFP_KERNEL allocations during this process as it is the only memory available. If you have a special KHO allocator (GFP_KHO?) then it can simply be aware of this and avoid allocating from the A/B zone. However, it would be much nicer to avoid having to mark possible KHO allocations in code at the allocation point, this would be nicer: p = alloc_pages(GFP_KERNEL) // time passes to_kho(p) So I agree there is an appeal to somehow using the existing allocators to stop taking unmovable pages from the A/B region after some point so that no to_kho() will ever get a page that in A/B. Can you take a ZONE_NORMAL, use it for booting, and then switch it to ZONE_MOVABLE, keeping all the unmovable memory? Something else? * - For drivers I'm imaging that we can do: p = alloc_pages(GFP_KERNEL|GFP_KHO|GFP_COMP, order); to_kho(p); // kexec from_kho(p); folio_put(p) Meaning KHO has to preserve the folio, keep the KVA the same, manage the refcount, and restore the GFP_COMP. I think if you have this as the basic primitive you can build everything else on top of it. Jason