Re: [RFC] Another Para-Virtualization page recycler. Empty Guest OS free pages every few seconds

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2017-09-20 05:05, XaviLi wrote:
> PPR (Per Page Recycler) is a para virtualization driver currently available for KVM hosts and Linux/Windows guests. With PPR, every page freed to Guest OS can be recycled in seconds by hypervisor. Therefore, VMs can dynamical allocate/free pages from hypervisor according to application’s request. This issue can result in a memory density boost in many cases with little CPU cost. PPR is a component of an open source solution named Dynamic VM:
> https://github.com/baibantech/dynamic_vm.git
> 
> patch can be found here: https://github.com/baibantech/dynamic_vm/tree/master/dynamic_vm_0.5
> 
> The guest side is very simple. PPR have a hook at the entry of page-free interface of Guest OS to mark every freed pages. Marked pages can be recycled and replaced by read-only zero page in one scan period. If the page was allocated again before the recycling happens, the mark was simply canceled by PPR’s page-allocation interface hook.
> 
> The guest driver works in a per-page way other than VirtIO balloon’s batch mode. The recycling issue is based on a lockless operation. The guest and host does not send any interrupt to each other for this.
> 
> We ran a test on our machine. Host use about 6500+ CPU cycles averagely for recycling each page. In which 3900 cycles is cost by the Linux kernel page-table operation. The guest hook cost about 300 cycles for mark or unmark each page. It is cheap enough in many cases. The daemon cost 5% of one CPU when the scanning found nothing to recycle. Other threads only move when there is work to do.
> 
> Saving memory often leads to a CPU-memory tradeoff topic. Here thing is different. In memory intensive usage we often see page-deduplication applied. It is very much cheaper to recycle a page than dedup it. In that case we can use PPR to win CPU and memory both.
> 
> We have a test to demonstrate how density was boosted. In which PPR works with a multithread page deduplication engine named PageONE rather than KSM. PageONE is going to be discussed in another topic.
> 
> Test environment
> DELL R730
> Intel® Xeon® E5-2650 v4 (2.20 GHz, of Cores 12, threads 24); 
> 256GB RAM
> Host OS: Ubuntu server 14.04 Host kernel: 4.4.1
> Qemu: 2.9.0
> Guest OS: Ubuntu server 16.04 Guest kernel: 4.4.76
> 
> Test VMs’ memory size was configured to be 4GB. Each VM ran an application allocate/free memory every 30 minutes. The top memory allocation size of each period is 3GB and average size is 150MB. We ran many such VMs in the same time same machine to see the total host-side memory cost.
> 
> 
> Raw KVM
> Result: 67 VMs used 230GB memory. 
> 
> KVM + KSM 
> Configuration: sleep_millisecs = 0, pages_to_scan = 1000000
> Result: 73 VMs used 230GB memory. KSM daemon cost 100% of one CPU
> 
> KVM + PPR + PageONE:
> Configuration: 
> work threads =12 reclaim_period = 10 secs merge_period = 600 secs 
> (Which means freed pages was recycled each 10 secs, pages were merged if not modified in 2 merge_periods)
> Result: 516 VMs run together. The total used memory goes up and down between 218GB and 139GB, averagely 182GB. 
> 
> PPR and PageONE shared a daemon thread and 12 work threads. We test their CPU cost together because the two features are sharing threads while offloading for each other. 
> 
> The average CPU cost of work threads is 6.1%. Daemon thread’s cost is 22.1%
> 
> Here we tried other combinations in the same test case:
> reclaim_period(sec)	merge_period(sec)	average memory(GB)	work threads' average CPU	daemon's CPU
> 5	200	164	6.8%	34%
> 5	600	176	6.3%	35%
> 10	200	171	6.6%	22%
> 10	600	182	6.1%	22%
> NOTE: It is a high pressure test. PPR was recycling 3.2TB memory per hour. In normal case PPR costs far less CPU than that.
> 

Your numbers look impressive!

Regarding the patches: You will have to break them up (and rebase over
the development heads) so that people can review them. The QEMU patch is
short, but it seems like the host-guest interface will need more work,
specifically proper mapping onto a PV mechanism.

But how do you map the PPR interface on Windows guests? Is there really
a hook in the Windows kernel that can be used, or do we need MS to add
that feature?

Jan

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux