Kent asked: > I ran into a major roadblock when I tried converting buddy allocator > freelists to radix trees: freeing a page may require allocating a new > page for the radix tree freelist, which is fine normally - we're freeing > a page after all - but not if it's highmem. So right now I'm not sure > if getting struct page down to two words is even possible. Oh well. I don't think I can answer this without explaining the whole design I have in mind, so here goes ... this is far more complicated than I would like it to be, but I think it *works*. Assuming that we have got the kernel to the point where no code uses struct page as a memory descriptor; ie struct page is now: struct page { unsigned long flags; unsigned long compound_head; unsigned long _pad[N]; /* for sufficiently large value of N */ }; and all code uses helpers like slab_page() and page_slab() instead of casting to and from struct page. Let's further assume that we have: struct buddy { unsigned long flags; struct list_head list; unsigned int order; }; The next step is to transition to: struct page { unsigned long mem_desc; }; struct buddy { unsigned long flags; struct list_head list; unsigned int order; struct page *page; }; (and similar changes for other memory descriptors, but I don't think we need to discuss those at this time). mem_desc is really a pointer, but the bottom few bits are the type id of the memory descriptor (eg buddy, slab, folio, anonmem, filemem, pgtable, vmalloc ... whatever types we decide to create) We must never allocate memory when freeing memory. So we must allocate the necessary struct buddy at the time we allocate the memory. If we cannot do so, we simply fail the allocation. The simplest case to handle is something like __get_free_page(). If we already have an order-0 page available, we take the page off the free list, mark the page as allocated (perhaps we have a different type id for allocated-buddy and free-buddy) and return its virtual address. If we don't have an order-0 page available when we call __get_free_page(), we have to split a higher-order page. Worst case, that's an order-11 page. So we'll need to allocate 11 buddy structs to describe two order-0 pages, one order-1 page, one order-2 page, ... one order-9 page. The new order-10 page can be described by the former order-11 buddy descriptor. This means changing the mem_desc pointers for 1024 struct pages, which isn't ideal, but also not the end of the world. We don't often have order-11 pages around! For something like slab which requires that struct page point to a different memory descriptor, it's the job of slab to keep track of the buddy struct and return it at free time. So struct slab gets a 'struct buddy' pointer. I think that slub's alloc_slab_page() looks like: static inline struct slub *alloc_slab_page(struct kmem_cache *s, gfp_t flags, int node, struct kmem_cache_order_objects oo) { struct slab *slab; unsigned long desc; struct buddy *buddy; /* The recursion stops here */ if (s == kmem_cache_slab) return __alloc_slab(flags); slab = kmem_cache_alloc(kmem_cache_slab, flags); if (!slab) return NULL; desc = MEM_DESC(slab, MEM_DESC_SLAB); if (node == NUMA_NO_NODE) buddy = alloc_pages_desc(flags, order, desc); else buddy = __alloc_pages_node_desc(node, flags, order, desc); if (!buddy) { kmem_cache_free(kmem_cache_slab, slab); return NULL; } slab->buddy = buddy; slab->flags = buddy->flags; slab->order = order; slab->virt = page_address(buddy->page); return slab; } Allocating a slab from the slab allocator is slightly tricky. Usually, this will work fine, but at some point we need to allocate a struct slab in order to allocate a struct slab, and that's the job of __alloc_slab(). It needs a new interface to the page allocator to return the smallest buddy currently available (or that can be made available given the GFP flags passed). If it is order-0, we initialise the first few bytes of it to be a struct slab; and make it self-describing. If it is order-1 or higher, we add the second page to kmem_cache_buddy (and allocate a second struct slab on the first page to describe it). We can then allocate sufficient struct buddy on this second page to describe all the buddy pages. Here's how I see the worst case call chain going (using slub): alloc_pages(order = 0) kmem_cache_alloc_bulk(kmem_cache_buddy) ___slab_alloc(kmem_cache_buddy) new_slab(kmem_cache_buddy) allocate_slab(kmem_cache_buddy) alloc_slab_page(kmem_cache_buddy) kmem_cache_alloc(kmem_cache_slab) ___slab_alloc(kmem_cache_slab) new_slab(kmem_cache_slab) allocate_slab(kmem_cache_slab) alloc_slab_page(kmem_cache_slab) __alloc_slab() alloc_pages() kmem_cache_alloc_bulk(kmem_cache_buddy) ___slab_alloc(kmem_cache_buddy) new_slab(kmem_cache_buddy) allocate_slab(kmem_cache_buddy) alloc_slab_page(kmem_cache_buddy) kmem_cache_alloc(kmem_cache_slab) ___slab_alloc(kmem_cache_slab) new_slab(kmem_cache_slab) allocate_slab(kmem_cache_slab) alloc_slab_page(kmem_cache_slab) __alloc_slab() To actually hit this depth of nesting, other processes/interrupts/threads must be simultaneously consuming both slab structs and buddy structs, as well as conspiring to keep low-order buddy lists empty while not depleting all the high-order buddy lists.