Re: [RFC][Patch v8 0/7] KVM: Guest Free Page Hinting

David Hildenbrand <david@xxxxxxxxxx> · Tue, 19 Feb 2019 14:03:32 +0100

>>>> There are two main ways to avoid allocation:
>>>> 1. do not add extra data on top of each chunk passed
>>> If I am not wrong then this is close to what we have right now.
>> Yes, minus the kthread(s) and eventually with some sort of memory
>> allocation for the request. Once you're asynchronous via a notification
>> mechanisnm, there is no real need for a thread anymore, hopefully.
> Whether we should go with kthread or without it, I would like to do some
> performance comparison before commenting on this.
>>
>>> One issue I see right now is that I am polling while host is freeing the
>>> memory.
>>> In the next version I could tie the logic which returns pages to the
>>> buddy and resets the per cpu array index value to 0 with the callback.
>>> (i.e.., it happens once we receive an response from the host)
>> The question is, what happens when freeing pages and the array is not
>> ready to be reused yet. In that case, you want to somehow continue
>> freeing pages without busy waiting or eventually not reporting pages.
> This is what happens right now.
> Having kthread or not should not effect this behavior.
> When the array is full the current approach simply skips collecting the
> free pages.

Well, it somehow does affect your implementation. If you have a kthread
you always have to synchronize against the VCPU: "Is the pcpu array
ready to be used again".

Once you do it asynchronously from your VCPU without another thread
being involved, such synchronization is not required. Simply prepare a
request and send it off. Reuse the pcpu array instantly. At least that's
the theory :)

If you have a guest bulk freeing a lot of memory, I guess temporarily
dropping free page hints could be counter-productive. It really depends
on how fast the thread gets scheduled and how long the hinting process
takes. Having another thread involved might add a lot to that latency to
that formula. We'll have to measure, but my gut feeling is that once we
do stuff asynchronously, there is no need for a thread anymore.

>>
>> The callback should put the pages back to the buddy and free the request
>> eventually to have a fully asynchronous mechanism.
>>
>>> Other change which I am testing right now is to only capture 'MAX_ORDER
>> I am not sure if this is an arbitrary number we came up with here. We
>> should really play with different orders to find a hot spot. I wouldn't
>> consider this high priority, though. Getting the whole concept right to
>> be able to deal with any magic number we come up should be the ultimate
>> goal. (stuff that only works with huge pages I consider not future
>> proof, especially regarding fragmented guests which can happen easily)
> Its quite possible that when we are only capturing MAX_ORDER - 1 and run
> a specific workload we don't get any memory back until we re-run the
> program and buddy finally starts merging of pages of order MAX_ORDER -1.
> This is why I think we may make this configurable from compile time and
> keep capturing MAX_ORDER - 1 so that we don't end up breaking anything.

Eventually pages will never get merged. Assume you have 1 page of a
MAX_ORDER - 1 chunk still allocated somewhere (e.g. !movable via
kmalloc). You skip reporting that chunk completely. Roughly 1mb/2mb/4mb
wasted (depending on the arch). This stuff can sum up.

-- 

Thanks,

David / dhildenb