Re: [RFC][Patch V7 0/7] KVM: Guest Page Hinting

Nitesh Narayan Lal <nitesh@xxxxxxxxxx> · Fri, 29 Jun 2018 17:04:01 -0400



On 06/18/2018 11:18 PM, Michael S. Tsirkin wrote:
> On Mon, Jun 11, 2018 at 11:18:55AM -0400, nilal@xxxxxxxxxx wrote:
>> The following patch-set proposes an efficient mechanism for handing freed memory between the guest and the host. It enables the guests to rapidly free and reclaim memory to and from the host respectively.
>>
>> Changelog in V7:
>>
>>     * The patch-series is moved back to RFC for the following reasons:
>>         * An issue in which page hinting enabled guest crashes followed by a segmentation fault in QEMU has been observed occasionally.
>>     * The HYPERLIST_THRESHOLD is changed to 1 to incorporate scenarios where hinting is required for just one hyperlist entry. This will be replaced by a better approach in the upcoming patch-series.
>>
>> Virtio interface changes are picked up from Wei's patch-set for Virtio-balloon enhancement[2]. "Wei, How would you like me to credit you in the final patch?")
>>
>> Test results on a single core:
>>
>>     1. Swap test case results:
>>
>>         The intent of this test case is to show that with this patch series, as the host runs out memory it can reclaim the guest freed memory dynamically for its use. I have been going through the
>>         Wei's patch-series and it may not solve such use cases.
>>         Following are the two results which shows without page hinting as the host runs out of memory swap memory is used:
>>
>>         i)
>>         Host memory before running the guest
>>                             total        used        free      shared  buff/cache   available
>>         Mem:            11G        2.3G        8.0G        841M        1.1G         8.1G
>>         Swap:          3.0G          0B         3.0G
>>         Host memory after running the guest and exhaustion of memory
>>         Mem:            11G         10G        132M        274M      537M         82M
>>         Swap:           3.0G        1.0G        2.0G
>>
>>         ii)
>>         Host memory before running the guest
>>                             total        used        free      shared  buff/cache   available
>>         Mem:            11G        2.2G        8.0G        862M        1.2G        8.1G
>>         Swap:          3.0G          0B         3.0G
>>         Host memory after running the guest and exhaustion of memory
>>         Mem:            11G         10G        126M        719M      1.0G         99M
>>         Swap:          3.0G        939M        2.1G
>>
>>         Following are the two results which shows with page hinting as the host runs out of memory guest freed memory is used instead of the swap space:
>>
>>         i)
>>         Host memory before running the guest
>>                             total        used        free      shared  buff/cache   available
>>         Mem:            11G        2.2G        8.1G        827M        1.1G        8.2G
>>         Swap:          3.0G          0B         3.0G
>>         Host memory after running the guest and exhaustion of memory
>>         Mem:            11G         10G        191M        851M      1.2G        2.6G
>>         Swap:          3.0G          0B         3.0G
>>
>>         ii)
>>         Host memory before running the guest
>>                             total        used        free      shared  buff/cache   available
>>         Mem:            11G        2.2G        8.0G        836M        1.2G        8.1G
>>         Swap:          3.0G          0B         3.0G
>>         Host memory after running the guest and exhaustion of memory
>>         Mem:            11G        9.8G        167M        853M      1.5G        2.5G
>>         Swap:          3.0G          0B         3.0G
>>
>>
>>     2. Netperf:
>>         Netperf and hackbench are used to analyze the impact of this series on guest throughput under these loads.
>>
>>                              Recv Socket Size bytes    Send Socket Size bytes        Send Message Size bytes    Elapsed Time secs.    Throughput 10^6 bits/sec
>>         Without Hinting
>>                  i)              87380                               16384                                       16384                                 100             23130.92
>>                  ii)             87380                               16384                                       16384                                 100             26114.51
>>                  iii)            87380                               16384                                       16384                                 100             22495.60
>>
>>         With Hinting
>>                  i)              87380                               16384                                       16384                                 100             20228.11
>>                  ii)             87380                               16384                                       16384                                 100             25689.46
>>                  iii)            87380                               16384                                       16384                                 100             19967.03
>>
>>     3. Hackbench:
>>         Number of process = 150
>>         Without Hinting time:
>>             i)   10.208
>>             ii)   9.879
>>             iii)  9.404
>>
>>         With Hinting time:
>>             i)   11.292
>>             ii)  11.057
>>             iii) 10.688
>>
>>
>> Explaination:
>>
>>     *To observe the swap space usage with and without guest page hinting, a guest with 6GB memory is booted. After which 4 GB memory is malloced and freed in the guest. In situation where there is no guest
>>      page hinting this memory will never  be returned to the host resulting in the usage of host memory as the host runs more process or malloc's more memory resulting in the usage of swap space. However, on
>>      a guest with guest page hinting enabled the memory freed by the guest will be reclaimed by the host due to which host when runs out of memory could use that instead of the swap space.
>>
>>     *This patch series enables the guest to prepare the list of free pages which will be sent to the host via hypercall. The patch-set leverages the existing arch_free_page() and arch_alloc_page() to add this
>>      functionality. It uses two lists one cpu-local and other cpu-global. Whenever a page is freed it is added to the respective cpu-local list until it is full. Once the list is full a seqlock is taken to
>>      prevent any further page allocations and the per cpu-local list is traversed in order to check for any fragmentation due to reallocations. If present those entries are defragmented and are added to the
>>      cpu-global list until it is full. Once the cpu-global list is full it is parsed and compressed.
>>      A hypercall is made only if the total number of entries are above the specified threshold value. A hypercall may affect the performance if done frequently and hence it needs to be minimized. This is the
>>      primary reason for compression, as it ensures replacement of multiple consecutive entries to a single one and removal of all duplicate entries causing frequent exhaustion of cpu-global list. After
>>      compressing the hyperlist there could be three following possibilities:
>>           *If the number of entries in this cpu-global list is greater than the threshold required for hypercall value then a hypercall is issued.
>>           *If the parsing of the cpu-local list is complete but the number of cpu-global list entries is less than the threshold then they are copied to a cpu-local list.
>>           *In case the parsing of the cpu-local list is yet not complete and the number of entries in the cpu-global list is less than the threshold then the parsing of the cpu-local list is continued and
>>            entries in the cpu-global list are added from the newly available index acquired after compression.
>>
>> [1] https://www.spinics.net/lists/kvm/msg159790.html
>> [2] https://www.spinics.net/lists/kvm/msg152734.html
>>
> Actually I have a question.  What if instead of sending hints,
> arch_free_page will memset the page to zero?
>
> KSM will be able to then find and unify them all.
>
> This might seem slow but
>
> 1. if the page is utilized by userspace it's zeroed
>    later anyway, we can set some flag and skip zeroing
> 2. as kvm inits guest memory to 0, we can set this flag on boot too
>
> We can also unify with Wei's approach: have the balloon send interrupt.
>
> And I think that without optimizations 1 and 2 enabling this is actually
> just a question of enabling page poisoning and setting poison value to
> 0.
>
> I'd very much like to see perf comparison at least with this
> configuration.
>
> Thanks,
Hi Michael,

We have been studying KSM in the last week. Recently we also had a
discussion with Andrea about the same. There are certain issues with the
current implementation of KSM which are listed below:
    - It is prone to side-channel attack
    - As KSM needs time to scan and merge the pages it may not work well
for immediate memory requirements or in a situation where there is a
single guest running
    - For the same reason mentioned above the CPU requirements for KSM
is also higher
Due to these reasons, it may not be ideal to replace the guest page
hinting approach with it. Although it is true that all of the
above-mentioned issues could be fixed and it would certainly be a good
enhancement but it may be more suitable to do these changes as a
separate project. Without the changes, it may not even be possible to
compare the two solutions.

In our discussion with Andrea, he has also made a few suggestions about
the ways using which I can remove the locking issue. I have started
working on it and will post an update soon.
>

-- 
Regards
Nitesh

Attachment:
signature.asc

Description: OpenPGP digital signature