Re: [RFC][Patch V7 0/7] KVM: Guest Page Hinting

David Hildenbrand <david@xxxxxxxxxx> · Tue, 19 Jun 2018 10:05:39 +0200

On 19.06.2018 05:18, Michael S. Tsirkin wrote:
> On Mon, Jun 11, 2018 at 11:18:55AM -0400, nilal@xxxxxxxxxx wrote:
>> The following patch-set proposes an efficient mechanism for handing freed memory between the guest and the host. It enables the guests to rapidly free and reclaim memory to and from the host respectively.
>>
>> Changelog in V7:
>>
>>     * The patch-series is moved back to RFC for the following reasons:
>>         * An issue in which page hinting enabled guest crashes followed by a segmentation fault in QEMU has been observed occasionally.
>>     * The HYPERLIST_THRESHOLD is changed to 1 to incorporate scenarios where hinting is required for just one hyperlist entry. This will be replaced by a better approach in the upcoming patch-series.
>>
>> Virtio interface changes are picked up from Wei's patch-set for Virtio-balloon enhancement[2]. "Wei, How would you like me to credit you in the final patch?")
>>
>> Test results on a single core:
>>
>>     1. Swap test case results:
>>
>>         The intent of this test case is to show that with this patch series, as the host runs out memory it can reclaim the guest freed memory dynamically for its use. I have been going through the
>>         Wei's patch-series and it may not solve such use cases.
>>         Following are the two results which shows without page hinting as the host runs out of memory swap memory is used:
>>
>>         i)
>>         Host memory before running the guest
>>                             total        used        free      shared  buff/cache   available
>>         Mem:            11G        2.3G        8.0G        841M        1.1G         8.1G
>>         Swap:          3.0G          0B         3.0G
>>         Host memory after running the guest and exhaustion of memory
>>         Mem:            11G         10G        132M        274M      537M         82M
>>         Swap:           3.0G        1.0G        2.0G
>>
>>         ii)
>>         Host memory before running the guest
>>                             total        used        free      shared  buff/cache   available
>>         Mem:            11G        2.2G        8.0G        862M        1.2G        8.1G
>>         Swap:          3.0G          0B         3.0G
>>         Host memory after running the guest and exhaustion of memory
>>         Mem:            11G         10G        126M        719M      1.0G         99M
>>         Swap:          3.0G        939M        2.1G
>>
>>         Following are the two results which shows with page hinting as the host runs out of memory guest freed memory is used instead of the swap space:
>>
>>         i)
>>         Host memory before running the guest
>>                             total        used        free      shared  buff/cache   available
>>         Mem:            11G        2.2G        8.1G        827M        1.1G        8.2G
>>         Swap:          3.0G          0B         3.0G
>>         Host memory after running the guest and exhaustion of memory
>>         Mem:            11G         10G        191M        851M      1.2G        2.6G
>>         Swap:          3.0G          0B         3.0G
>>
>>         ii)
>>         Host memory before running the guest
>>                             total        used        free      shared  buff/cache   available
>>         Mem:            11G        2.2G        8.0G        836M        1.2G        8.1G
>>         Swap:          3.0G          0B         3.0G
>>         Host memory after running the guest and exhaustion of memory
>>         Mem:            11G        9.8G        167M        853M      1.5G        2.5G
>>         Swap:          3.0G          0B         3.0G
>>
>>
>>     2. Netperf:
>>         Netperf and hackbench are used to analyze the impact of this series on guest throughput under these loads.
>>
>>                              Recv Socket Size bytes    Send Socket Size bytes        Send Message Size bytes    Elapsed Time secs.    Throughput 10^6 bits/sec
>>         Without Hinting
>>                  i)              87380                               16384                                       16384                                 100             23130.92
>>                  ii)             87380                               16384                                       16384                                 100             26114.51
>>                  iii)            87380                               16384                                       16384                                 100             22495.60
>>
>>         With Hinting
>>                  i)              87380                               16384                                       16384                                 100             20228.11
>>                  ii)             87380                               16384                                       16384                                 100             25689.46
>>                  iii)            87380                               16384                                       16384                                 100             19967.03
>>
>>     3. Hackbench:
>>         Number of process = 150
>>         Without Hinting time:
>>             i)   10.208
>>             ii)   9.879
>>             iii)  9.404
>>
>>         With Hinting time:
>>             i)   11.292
>>             ii)  11.057
>>             iii) 10.688
>>
>>
>> Explaination:
>>
>>     *To observe the swap space usage with and without guest page hinting, a guest with 6GB memory is booted. After which 4 GB memory is malloced and freed in the guest. In situation where there is no guest
>>      page hinting this memory will never  be returned to the host resulting in the usage of host memory as the host runs more process or malloc's more memory resulting in the usage of swap space. However, on
>>      a guest with guest page hinting enabled the memory freed by the guest will be reclaimed by the host due to which host when runs out of memory could use that instead of the swap space.
>>
>>     *This patch series enables the guest to prepare the list of free pages which will be sent to the host via hypercall. The patch-set leverages the existing arch_free_page() and arch_alloc_page() to add this
>>      functionality. It uses two lists one cpu-local and other cpu-global. Whenever a page is freed it is added to the respective cpu-local list until it is full. Once the list is full a seqlock is taken to
>>      prevent any further page allocations and the per cpu-local list is traversed in order to check for any fragmentation due to reallocations. If present those entries are defragmented and are added to the
>>      cpu-global list until it is full. Once the cpu-global list is full it is parsed and compressed.
>>      A hypercall is made only if the total number of entries are above the specified threshold value. A hypercall may affect the performance if done frequently and hence it needs to be minimized. This is the
>>      primary reason for compression, as it ensures replacement of multiple consecutive entries to a single one and removal of all duplicate entries causing frequent exhaustion of cpu-global list. After
>>      compressing the hyperlist there could be three following possibilities:
>>           *If the number of entries in this cpu-global list is greater than the threshold required for hypercall value then a hypercall is issued.
>>           *If the parsing of the cpu-local list is complete but the number of cpu-global list entries is less than the threshold then they are copied to a cpu-local list.
>>           *In case the parsing of the cpu-local list is yet not complete and the number of entries in the cpu-global list is less than the threshold then the parsing of the cpu-local list is continued and
>>            entries in the cpu-global list are added from the newly available index acquired after compression.
>>
>> [1] https://www.spinics.net/lists/kvm/msg159790.html
>> [2] https://www.spinics.net/lists/kvm/msg152734.html
>>
> 
> Actually I have a question.  What if instead of sending hints,
> arch_free_page will memset the page to zero?

I remember proposing the same thing and the reply was that memset on
every free and scanning for zero pages in the host to detect the free
pages again is performance wise not the right thing to do.

But I also have never seen performance numbers.

> 
> KSM will be able to then find and unify them all.

If low on memory, we have to rely on KSM then instead of just picking a
page that we know is free (MADV_FREE)

> 
> This might seem slow but
> 
> 1. if the page is utilized by userspace it's zeroed
>    later anyway, we can set some flag and skip zeroing
> 2. as kvm inits guest memory to 0, we can set this flag on boot too

Not true on reboots, right?

> 
> We can also unify with Wei's approach: have the balloon send interrupt.
> 
> And I think that without optimizations 1 and 2 enabling this is actually
> just a question of enabling page poisoning and setting poison value to
> 0.
> 
> I'd very much like to see perf comparison at least with this
> configuration.
> 
> Thanks,
> 
> 

-- 

Thanks,

David / dhildenb