On 2/7/19 12:43 PM, Alexander Duyck wrote: > On Tue, Feb 5, 2019 at 3:21 PM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote: >> On Tue, Feb 05, 2019 at 04:54:03PM -0500, Nitesh Narayan Lal wrote: >>> On 2/5/19 3:45 PM, Michael S. Tsirkin wrote: >>>> On Mon, Feb 04, 2019 at 03:18:53PM -0500, Nitesh Narayan Lal wrote: >>>>> This patch enables the kernel to scan the per cpu array and >>>>> compress it by removing the repetitive/re-allocated pages. >>>>> Once the per cpu array is completely filled with pages in the >>>>> buddy it wakes up the kernel per cpu thread which re-scans the >>>>> entire per cpu array by acquiring a zone lock corresponding to >>>>> the page which is being scanned. If the page is still free and >>>>> present in the buddy it tries to isolate the page and adds it >>>>> to another per cpu array. >>>>> >>>>> Once this scanning process is complete and if there are any >>>>> isolated pages added to the new per cpu array kernel thread >>>>> invokes hyperlist_ready(). >>>>> >>>>> In hyperlist_ready() a hypercall is made to report these pages to >>>>> the host using the virtio-balloon framework. In order to do so >>>>> another virtqueue 'hinting_vq' is added to the balloon framework. >>>>> As the host frees all the reported pages, the kernel thread returns >>>>> them back to the buddy. >>>>> >>>>> Signed-off-by: Nitesh Narayan Lal <nitesh@xxxxxxxxxx> >>>> This looks kind of like what early iterations of Wei's patches did. >>>> >>>> But this has lots of issues, for example you might end up with >>>> a hypercall per a 4K page. >>>> So in the end, he switched over to just reporting only >>>> MAX_ORDER - 1 pages. >>> You mean that I should only capture/attempt to isolate pages with order >>> MAX_ORDER - 1? >>>> Would that be a good idea for you too? >>> Will it help if we have a threshold value based on the amount of memory >>> captured instead of the number of entries/pages in the array? >> This is what Wei's patches do at least. > So in the solution I had posted I was looking more at > HUGETLB_PAGE_ORDER and above as the size of pages to provide the hints > on [1]. The advantage to doing that is that you can also avoid > fragmenting huge pages which in turn can cause what looks like a > memory leak as the memory subsystem attempts to reassemble huge > pages[2]. In my mind a 2MB page makes good sense in terms of the size > of things to be performing hints on as anything smaller than that is > going to just end up being a bunch of extra work and end up causing a > bunch of fragmentation. As per my opinion, in any implementation which page size to store before reporting depends on the allocation pattern of the workload running in the guest. I am also planning to try Michael's suggestion of using MAX_ORDER - 1. However I am still thinking about a workload which I can use to test its effectiveness. > > The only issue with limiting things on an arbitrary boundary like that > is that you have to hook into the buddy allocator to catch the cases > where a page has been merged up into that range. I don't think, I understood your comment completely. In any case, we have to rely on the buddy for merging the pages. > > [1] https://lkml.org/lkml/2019/2/4/903 > [2] https://blog.digitalocean.com/transparent-huge-pages-and-alternative-memory-allocators/ -- Regards Nitesh
Attachment:
signature.asc
Description: OpenPGP digital signature