Re: [kvm-unit-tests PATCH v2 6/6] lib/x86/vm: enable malloc and friends

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Thu, 3 Nov 2016 18:34:45 +0100

On 03/11/2016 17:51, Andrew Jones wrote:
> On Thu, Nov 03, 2016 at 05:00:01PM +0100, Paolo Bonzini wrote:
>>
>>
>> On 03/11/2016 15:58, Andrew Jones wrote:
>>> Actually I have it envisioned the other way around. phys_alloc should
>>> only be for early allocations, before enabling the MMU (which is why
>>> it's called "phys"_alloc).
>>
>> Yes, I agree.  Putting your other suggestions together, you'd have for
>> early allocation:
>>
>>    phys        // provide contiguous physical addresses
>>     |
>>    morecore    // provide contiguous RAM
>>     |
>>    malloc      // full API
>>
>> Later, you'd switch morecore to also take care of installing PTEs:
>>
>>     page     virt
>>       \       /
>>       morecore
>>           |
>>        malloc
> 
> Yup, I think we're on the same page (no pun intended). However, as
> morecore needs to take both size and alignment arguments in order
> to be able to cope with any offsets that a memalign call would create,
> then I was planning to just keep it named early_memalign.

You're right, it needs alignment too.  So yeah, early_memalign's code
can be repurposed as the early allocator's morecore callback.  For the
current trivial malloc implementation, malloc would be implemented as
ops->morecore(size, align_min).  I think we agree on the overall idea
and on having only one function pointer.

>> Now morecore talks to both page and virt.  page (alloc_page) is
>> initialized with whatever memory wasn't allocated by phys, and takes up
>> its job.  virt (alloc_vpages) allocates virtual address space.  morecore
>> is now taking not necessarily consecutive physical memory at page
>> granularity, assigning consecutive virtual address to it, and if needed
>> slicing pages into smaller allocations.
> 
> vmalloc does all that except the slicing up feature, which is left for
> a rainy day... So I think the implementation I have here for vmemalign
> is still valid for x86's "morecore".

Fair enough, the slicing can come later.  Can you do v3 that (on top of
what you have in this series):

1) changes alloc_ops to only have the .memalign callback, with fixed
implementations of malloc/memalign/free functions;

2) makes vmalloc&vmap use alloc_vpages (i.e. use the "virt" allocator);

3) make alloc_vpages go bottom to top (not strictly necessary but nice);

4) replaces vmalloc/vfree with memalign(align=4096)/free in x86/;

5) moves the virt allocator and the vmalloc ops to a new file
lib/vmalloc.c (with install_page() remaining as the sole hook into
arch-specific code), removing what's now dead from lib/vmalloc.c.

I think your current patches fit between (3) and (4), since they end
with the vmemalign function and the ops struct in lib/x86/vm.c.

Thanks,

Paolo

>> morecore is not an external API, it's just malloc's pluggable interface.
>>  page/virt/malloc are all external APIs.  phys could be, but nothing
> 
> Right, both vmemalign and early_memalign are internal only (static) and
> alloc_page and alloc_vpages and malloc/calloc/memalign/free are external.
> x86 also has vmalloc/vfree (the building block for vmemalign) which allow
> callers to explicitly ask for virtual addresses corresponding to a new
> allocation. We should keep that external because malloc doesn't imply
> you're getting PTEs set, as it depends on the stage/degree of test init.
> 
>> uses phys_alloc_aligned and phys_zalloc_aligned so we might as well zap it.
> 
> These two are useful for finding/getting the "whatever memory wasn't
> allocated by phys" you state above. To do that, you simple allocate one
> more region containing the amount of memory you want the new allocator
> to have with one of them. I'll zap phys_alloc and phys_zalloc though.
> 
>>
>> Note that you can choose separately at each layer whether to implement
>> freeing or not.  I think we only need to have it in "page" and "malloc"
>> (though in the latter it can be dummy, as it is now, since "malloc" is
>> just a small veneer around morecore).
> 
> Yup, check.
> 
>>
>>>> 	addr = malloc_free;
>>>> 	available = malloc_top - addr;
>>>> 	if (available < size) {
>>>> 		if (vfree_bottom == malloc_top + 1) {
>>>> 			// can reuse top of previous segment
>>>> 			vsize = PAGE_ALIGN(size - available);
>>>> 			vaddr = alloc_vpages(vsize >> PAGE_SHIFT);
>>>> 			assert(vaddr == malloc_top + 1);
>>>> 		} else {
>>>> 			// ignore free space at end of previous segment
>>>> 			vsize = PAGE_ALIGN(size);
>>>> 			addr = vaddr =
>>>> 				alloc_vpages(vsize >> PAGE_SHIFT);
>>>> 		}
>>>> 		malloc_top = vaddr + vsize - 1;
>>>> 	}
>>>> 	malloc_free = addr + size;
>>>> 	return addr;
>>>
>>> I agree lib/x86/vm.c needs to change it's virtual address allocation
>>> scheme. It allocates down and never allows an address to be reused
>>> (but maybe that's by design to avoid TLB flushes?) I didn't feel like
>>> tackling that right now though. I'm not sure what the function you've
>>> written should be used for. To me, it looks like it's an extension of
>>> the virtual address management already available. Where do alloc_page()
>>> and install_page() come in, like vmalloc has?
>>
>> The alloc_page()/install_page() loop would be before setting malloc_top.
> 
> Ok, so if we don't want to rework vmalloc/alloc_vpages for this series
> to use a different virtual address allocation scheme (I don't ;-), then
> I think this part can be left for another time.
> 
> Thanks,
> drew
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html