On 03/11/2016 17:51, Andrew Jones wrote: > On Thu, Nov 03, 2016 at 05:00:01PM +0100, Paolo Bonzini wrote: >> >> >> On 03/11/2016 15:58, Andrew Jones wrote: >>> Actually I have it envisioned the other way around. phys_alloc should >>> only be for early allocations, before enabling the MMU (which is why >>> it's called "phys"_alloc). >> >> Yes, I agree. Putting your other suggestions together, you'd have for >> early allocation: >> >> phys // provide contiguous physical addresses >> | >> morecore // provide contiguous RAM >> | >> malloc // full API >> >> Later, you'd switch morecore to also take care of installing PTEs: >> >> page virt >> \ / >> morecore >> | >> malloc > > Yup, I think we're on the same page (no pun intended). However, as > morecore needs to take both size and alignment arguments in order > to be able to cope with any offsets that a memalign call would create, > then I was planning to just keep it named early_memalign. You're right, it needs alignment too. So yeah, early_memalign's code can be repurposed as the early allocator's morecore callback. For the current trivial malloc implementation, malloc would be implemented as ops->morecore(size, align_min). I think we agree on the overall idea and on having only one function pointer. >> Now morecore talks to both page and virt. page (alloc_page) is >> initialized with whatever memory wasn't allocated by phys, and takes up >> its job. virt (alloc_vpages) allocates virtual address space. morecore >> is now taking not necessarily consecutive physical memory at page >> granularity, assigning consecutive virtual address to it, and if needed >> slicing pages into smaller allocations. > > vmalloc does all that except the slicing up feature, which is left for > a rainy day... So I think the implementation I have here for vmemalign > is still valid for x86's "morecore". Fair enough, the slicing can come later. Can you do v3 that (on top of what you have in this series): 1) changes alloc_ops to only have the .memalign callback, with fixed implementations of malloc/memalign/free functions; 2) makes vmalloc&vmap use alloc_vpages (i.e. use the "virt" allocator); 3) make alloc_vpages go bottom to top (not strictly necessary but nice); 4) replaces vmalloc/vfree with memalign(align=4096)/free in x86/; 5) moves the virt allocator and the vmalloc ops to a new file lib/vmalloc.c (with install_page() remaining as the sole hook into arch-specific code), removing what's now dead from lib/vmalloc.c. I think your current patches fit between (3) and (4), since they end with the vmemalign function and the ops struct in lib/x86/vm.c. Thanks, Paolo >> morecore is not an external API, it's just malloc's pluggable interface. >> page/virt/malloc are all external APIs. phys could be, but nothing > > Right, both vmemalign and early_memalign are internal only (static) and > alloc_page and alloc_vpages and malloc/calloc/memalign/free are external. > x86 also has vmalloc/vfree (the building block for vmemalign) which allow > callers to explicitly ask for virtual addresses corresponding to a new > allocation. We should keep that external because malloc doesn't imply > you're getting PTEs set, as it depends on the stage/degree of test init. > >> uses phys_alloc_aligned and phys_zalloc_aligned so we might as well zap it. > > These two are useful for finding/getting the "whatever memory wasn't > allocated by phys" you state above. To do that, you simple allocate one > more region containing the amount of memory you want the new allocator > to have with one of them. I'll zap phys_alloc and phys_zalloc though. > >> >> Note that you can choose separately at each layer whether to implement >> freeing or not. I think we only need to have it in "page" and "malloc" >> (though in the latter it can be dummy, as it is now, since "malloc" is >> just a small veneer around morecore). > > Yup, check. > >> >>>> addr = malloc_free; >>>> available = malloc_top - addr; >>>> if (available < size) { >>>> if (vfree_bottom == malloc_top + 1) { >>>> // can reuse top of previous segment >>>> vsize = PAGE_ALIGN(size - available); >>>> vaddr = alloc_vpages(vsize >> PAGE_SHIFT); >>>> assert(vaddr == malloc_top + 1); >>>> } else { >>>> // ignore free space at end of previous segment >>>> vsize = PAGE_ALIGN(size); >>>> addr = vaddr = >>>> alloc_vpages(vsize >> PAGE_SHIFT); >>>> } >>>> malloc_top = vaddr + vsize - 1; >>>> } >>>> malloc_free = addr + size; >>>> return addr; >>> >>> I agree lib/x86/vm.c needs to change it's virtual address allocation >>> scheme. It allocates down and never allows an address to be reused >>> (but maybe that's by design to avoid TLB flushes?) I didn't feel like >>> tackling that right now though. I'm not sure what the function you've >>> written should be used for. To me, it looks like it's an extension of >>> the virtual address management already available. Where do alloc_page() >>> and install_page() come in, like vmalloc has? >> >> The alloc_page()/install_page() loop would be before setting malloc_top. > > Ok, so if we don't want to rework vmalloc/alloc_vpages for this series > to use a different virtual address allocation scheme (I don't ;-), then > I think this part can be left for another time. > > Thanks, > drew > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html