On Mon, Oct 25, 2021 at 08:55:21PM +0100, Matthew Wilcox wrote: > Kent asked: > > I ran into a major roadblock when I tried converting buddy allocator > > freelists to radix trees: freeing a page may require allocating a new > > page for the radix tree freelist, which is fine normally - we're freeing > > a page after all - but not if it's highmem. So right now I'm not sure > > if getting struct page down to two words is even possible. Oh well. > > I don't think I can answer this without explaining the whole design > I have in mind, so here goes ... this is far more complicated than > I would like it to be, but I think it *works*. So you've got two separately allocated structs per compound page - struct buddy, for allocator/freelist state, and struct folio or slab or whatever, for allocatee state. This lets you get struct page - our 4k page tax - down to a single pointer. But the shenanigans required for separately allocating struct buddy make me want to go back to my proposal :) The difference between your proposal and mine is that in mine, we don't separately allocate struct buddy, instead we only shrink struct page down to two words/pointers, not one. We can get the state for a free page down to two words if we replace the doubly linked freelists with a dequeue implemented as a radix tree: the second word in struct page will be a pointer to allocatee state for allocated pages, but for free pages it will be an index onto the freelist. As you also noted, splitting page->flags up between allocator state and allocatee state (i.e. moving some of it to the folio) means we'll be able to fit compound/buddy order in page->flags; that becomes the allocator state word in my model. The issue I ran into was where we have to allocate new pages for the freelist radix tree: normally there's no issue here because we can just consume the page we're trying to free. But if the page is highmem - oof. So I've been kicking around the idea of implementing a version of my lib/generic-radix-tree.c code where we use the low bit of pointers to nodes to indicate when they're highmem pages that need to be kmap_local()'d. I think done this way the performance overhead will be negligable, in practice. So, I'm gonna cook this up and see how it comes out...