>>>> There are two main ways to avoid allocation: >>>> 1. do not add extra data on top of each chunk passed >>> If I am not wrong then this is close to what we have right now. >> Yes, minus the kthread(s) and eventually with some sort of memory >> allocation for the request. Once you're asynchronous via a notification >> mechanisnm, there is no real need for a thread anymore, hopefully. > Whether we should go with kthread or without it, I would like to do some > performance comparison before commenting on this. >> >>> One issue I see right now is that I am polling while host is freeing the >>> memory. >>> In the next version I could tie the logic which returns pages to the >>> buddy and resets the per cpu array index value to 0 with the callback. >>> (i.e.., it happens once we receive an response from the host) >> The question is, what happens when freeing pages and the array is not >> ready to be reused yet. In that case, you want to somehow continue >> freeing pages without busy waiting or eventually not reporting pages. > This is what happens right now. > Having kthread or not should not effect this behavior. > When the array is full the current approach simply skips collecting the > free pages. Well, it somehow does affect your implementation. If you have a kthread you always have to synchronize against the VCPU: "Is the pcpu array ready to be used again". Once you do it asynchronously from your VCPU without another thread being involved, such synchronization is not required. Simply prepare a request and send it off. Reuse the pcpu array instantly. At least that's the theory :) If you have a guest bulk freeing a lot of memory, I guess temporarily dropping free page hints could be counter-productive. It really depends on how fast the thread gets scheduled and how long the hinting process takes. Having another thread involved might add a lot to that latency to that formula. We'll have to measure, but my gut feeling is that once we do stuff asynchronously, there is no need for a thread anymore. >> >> The callback should put the pages back to the buddy and free the request >> eventually to have a fully asynchronous mechanism. >> >>> Other change which I am testing right now is to only capture 'MAX_ORDER >> I am not sure if this is an arbitrary number we came up with here. We >> should really play with different orders to find a hot spot. I wouldn't >> consider this high priority, though. Getting the whole concept right to >> be able to deal with any magic number we come up should be the ultimate >> goal. (stuff that only works with huge pages I consider not future >> proof, especially regarding fragmented guests which can happen easily) > Its quite possible that when we are only capturing MAX_ORDER - 1 and run > a specific workload we don't get any memory back until we re-run the > program and buddy finally starts merging of pages of order MAX_ORDER -1. > This is why I think we may make this configurable from compile time and > keep capturing MAX_ORDER - 1 so that we don't end up breaking anything. Eventually pages will never get merged. Assume you have 1 page of a MAX_ORDER - 1 chunk still allocated somewhere (e.g. !movable via kmalloc). You skip reporting that chunk completely. Roughly 1mb/2mb/4mb wasted (depending on the arch). This stuff can sum up. -- Thanks, David / dhildenb