Re: [RFC][Patch V7 0/7] KVM: Guest Page Hinting

Nitesh Narayan Lal <nitesh@xxxxxxxxxx> · Wed, 20 Jun 2018 13:35:43 -0400



On 06/18/2018 11:18 PM, Michael S. Tsirkin wrote:
> On Mon, Jun 11, 2018 at 11:18:55AM -0400, nilal@xxxxxxxxxx wrote:
>> The following patch-set proposes an efficient mechanism for handing freed memory between the guest and the host. It enables the guests to rapidly free and reclaim memory to and from the host respectively.
>>
>> Changelog in V7:
>>
>>     * The patch-series is moved back to RFC for the following reasons:
>>         * An issue in which page hinting enabled guest crashes followed by a segmentation fault in QEMU has been observed occasionally.
>>     * The HYPERLIST_THRESHOLD is changed to 1 to incorporate scenarios where hinting is required for just one hyperlist entry. This will be replaced by a better approach in the upcoming patch-series.
>>
>> Virtio interface changes are picked up from Wei's patch-set for Virtio-balloon enhancement[2]. "Wei, How would you like me to credit you in the final patch?")
>>
>> Test results on a single core:
>>
>>     1. Swap test case results:
>>
>>         The intent of this test case is to show that with this patch series, as the host runs out memory it can reclaim the guest freed memory dynamically for its use. I have been going through the
>>         Wei's patch-series and it may not solve such use cases.
>>         Following are the two results which shows without page hinting as the host runs out of memory swap memory is used:
>>
>>         i)
>>         Host memory before running the guest
>>                             total        used        free      shared  buff/cache   available
>>         Mem:            11G        2.3G        8.0G        841M        1.1G         8.1G
>>         Swap:          3.0G          0B         3.0G
>>         Host memory after running the guest and exhaustion of memory
>>         Mem:            11G         10G        132M        274M      537M         82M
>>         Swap:           3.0G        1.0G        2.0G
>>
>>         ii)
>>         Host memory before running the guest
>>                             total        used        free      shared  buff/cache   available
>>         Mem:            11G        2.2G        8.0G        862M        1.2G        8.1G
>>         Swap:          3.0G          0B         3.0G
>>         Host memory after running the guest and exhaustion of memory
>>         Mem:            11G         10G        126M        719M      1.0G         99M
>>         Swap:          3.0G        939M        2.1G
>>
>>         Following are the two results which shows with page hinting as the host runs out of memory guest freed memory is used instead of the swap space:
>>
>>         i)
>>         Host memory before running the guest
>>                             total        used        free      shared  buff/cache   available
>>         Mem:            11G        2.2G        8.1G        827M        1.1G        8.2G
>>         Swap:          3.0G          0B         3.0G
>>         Host memory after running the guest and exhaustion of memory
>>         Mem:            11G         10G        191M        851M      1.2G        2.6G
>>         Swap:          3.0G          0B         3.0G
>>
>>         ii)
>>         Host memory before running the guest
>>                             total        used        free      shared  buff/cache   available
>>         Mem:            11G        2.2G        8.0G        836M        1.2G        8.1G
>>         Swap:          3.0G          0B         3.0G
>>         Host memory after running the guest and exhaustion of memory
>>         Mem:            11G        9.8G        167M        853M      1.5G        2.5G
>>         Swap:          3.0G          0B         3.0G
>>
>>
>>     2. Netperf:
>>         Netperf and hackbench are used to analyze the impact of this series on guest throughput under these loads.
>>
>>                              Recv Socket Size bytes    Send Socket Size bytes        Send Message Size bytes    Elapsed Time secs.    Throughput 10^6 bits/sec
>>         Without Hinting
>>                  i)              87380                               16384                                       16384                                 100             23130.92
>>                  ii)             87380                               16384                                       16384                                 100             26114.51
>>                  iii)            87380                               16384                                       16384                                 100             22495.60
>>
>>         With Hinting
>>                  i)              87380                               16384                                       16384                                 100             20228.11
>>                  ii)             87380                               16384                                       16384                                 100             25689.46
>>                  iii)            87380                               16384                                       16384                                 100             19967.03
>>
>>     3. Hackbench:
>>         Number of process = 150
>>         Without Hinting time:
>>             i)   10.208
>>             ii)   9.879
>>             iii)  9.404
>>
>>         With Hinting time:
>>             i)   11.292
>>             ii)  11.057
>>             iii) 10.688
>>
>>
>> Explaination:
>>
>>     *To observe the swap space usage with and without guest page hinting, a guest with 6GB memory is booted. After which 4 GB memory is malloced and freed in the guest. In situation where there is no guest
>>      page hinting this memory will never  be returned to the host resulting in the usage of host memory as the host runs more process or malloc's more memory resulting in the usage of swap space. However, on
>>      a guest with guest page hinting enabled the memory freed by the guest will be reclaimed by the host due to which host when runs out of memory could use that instead of the swap space.
>>
>>     *This patch series enables the guest to prepare the list of free pages which will be sent to the host via hypercall. The patch-set leverages the existing arch_free_page() and arch_alloc_page() to add this
>>      functionality. It uses two lists one cpu-local and other cpu-global. Whenever a page is freed it is added to the respective cpu-local list until it is full. Once the list is full a seqlock is taken to
>>      prevent any further page allocations and the per cpu-local list is traversed in order to check for any fragmentation due to reallocations. If present those entries are defragmented and are added to the
>>      cpu-global list until it is full. Once the cpu-global list is full it is parsed and compressed.
>>      A hypercall is made only if the total number of entries are above the specified threshold value. A hypercall may affect the performance if done frequently and hence it needs to be minimized. This is the
>>      primary reason for compression, as it ensures replacement of multiple consecutive entries to a single one and removal of all duplicate entries causing frequent exhaustion of cpu-global list. After
>>      compressing the hyperlist there could be three following possibilities:
>>           *If the number of entries in this cpu-global list is greater than the threshold required for hypercall value then a hypercall is issued.
>>           *If the parsing of the cpu-local list is complete but the number of cpu-global list entries is less than the threshold then they are copied to a cpu-local list.
>>           *In case the parsing of the cpu-local list is yet not complete and the number of entries in the cpu-global list is less than the threshold then the parsing of the cpu-local list is continued and
>>            entries in the cpu-global list are added from the newly available index acquired after compression.
>>
>> [1] https://www.spinics.net/lists/kvm/msg159790.html
>> [2] https://www.spinics.net/lists/kvm/msg152734.html
>>
> Actually I have a question.  What if instead of sending hints,
> arch_free_page will memset the page to zero?
>
> KSM will be able to then find and unify them all.
>
> This might seem slow but
>
> 1. if the page is utilized by userspace it's zeroed
>    later anyway, we can set some flag and skip zeroing
> 2. as kvm inits guest memory to 0, we can set this flag on boot too
>
> We can also unify with Wei's approach: have the balloon send interrupt.
>
> And I think that without optimizations 1 and 2 enabling this is actually
> just a question of enabling page poisoning and setting poison value to
> 0.
>
> I'd very much like to see perf comparison at least with this
> configuration.
We are planning to compare the performance of setup - with KSM enabled,
KSM and page hinting disabled and page hinting enabled on the basis of
the number of guests which could be run on each setup without the usage
of swap space.
Is there a recommended configuration for KSM for the scenario we're
testing? I'm thinking about tuning KSM to scan gigas worth of pages, is
this recommended?
In order to ensure that the guest is using the memory assigned to it, we
are planning to use a simple program which could malloc a certain amount
of memory, memset it to 1 and then sleeps for some time to hold it.
We are open to other suggestions for any other use case which could help
us compare the performance of the three scenarios in the best possible way.

>
> Thanks,
>
>

-- 
Regards
Nitesh

Attachment:
signature.asc

Description: OpenPGP digital signature