On Mon, Mar 29, 2021 at 05:58:32PM +0100, Matthew Wilcox wrote: > In broad strokes, I think that having a Power Of Two Allocator > with Descriptor (POTAD) is a useful foundational allocator to have. > The specific allocator that we call the buddy allocator is very clever for > the 1990s, but touches too many cachelines to be good with today's CPUs. > The generalisation of the buddy allocator to the POTAD lets us allocate > smaller quantities (eg a 512 byte block) and allocate descriptors which > differ in size from a struct page. For an extreme example, see xfs_buf > which is 360 bytes and is the descriptor for an allocation between 512 > and 65536 bytes. > > There are times when we need to get from the physical address to > the descriptor, eg memory-failure.c or get_user_pages(). This is the > equivalent of phys_to_page(), and it's going to have to be a lookup tree. > I think this is a role for the Maple Tree, but it's not ready yet. > I don't know if it'll be fast enough for this case. There's also the > need (particularly for memory-failure) to determine exactly what kind > of descriptor we're dealing with, and also its size. Even its owner, > so we can notify them of memory failure. A couple of things I forgot to mention ... I'd like the POTAD to be not necessarily tied to allocating memory. For example, I think it could be used to allocate swap space. eg the swap code could register the space in a swap file as allocatable through the POTAD, and then later ask the POTAD to allocate a POT from the swap space. The POTAD wouldn't need to be limited to MAX_ORDER. It should be perfectly capable of allocating 1TB if your machine has 1.5TB of RAM in it (... and things haven't got too fragmented) I think the POTAD can be used to replace the CMA. The CMA supports weirdo things like "Allocate 8MB of memory at a 1MB alignment", and I think that's doable within the data structures that I'm thinking about for the POTAD. It'd first try to allocate an 8MB chunk at 8MB alignment, and then if that's not possible, try to allocate two adjacent 4MB chunks; continuing down until it finds that there aren't 8x1MB chunks, at which point it can give up.