Re: [Patch v4 1/6] KVM: Support for guest page hinting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Michael,


On 11/13/2017 12:59 PM, Michael S. Tsirkin wrote:
> On Fri, Nov 03, 2017 at 04:30:08PM -0400, nilal@xxxxxxxxxx wrote:
>> From: Nitesh Narayan Lal <nilal@xxxxxxxxxx>
>>
>> This patch includes the following:
>> 1. Basic skeleton for the support
>> 2. Enablement of x86 platform to use the same
>>
>> Signed-off-by: Nitesh Narayan Lal <nilal@xxxxxxxxxx>
>> ---
>>  arch/x86/Kbuild         |  2 +-
>>  arch/x86/kvm/Makefile   |  2 ++
>>  include/linux/gfp.h     |  7 +++++++
>>  virt/kvm/Kconfig        |  4 ++++
>>  virt/kvm/page_hinting.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
>>  5 files changed, 60 insertions(+), 1 deletion(-)
>>  create mode 100644 virt/kvm/page_hinting.c
>>
>> diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild
>> index 0038a2d..7d39d7d 100644
>> --- a/arch/x86/Kbuild
>> +++ b/arch/x86/Kbuild
>> @@ -2,7 +2,7 @@ obj-y += entry/
>>  
>>  obj-$(CONFIG_PERF_EVENTS) += events/
>>  
>> -obj-$(CONFIG_KVM) += kvm/
>> +obj-$(subst m,y,$(CONFIG_KVM)) += kvm/
>>  
>>  # Xen paravirtualization support
>>  obj-$(CONFIG_XEN) += xen/
>> diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
>> index 09d4b17..d8a3800 100644
>> --- a/arch/x86/kvm/Makefile
>> +++ b/arch/x86/kvm/Makefile
>> @@ -15,6 +15,8 @@ kvm-y			+= x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
>>  			   i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \
>>  			   hyperv.o page_track.o debugfs.o
>>  
>> +obj-$(CONFIG_KVM_FREE_PAGE_HINTING)	+= $(KVM)/page_hinting.o
>> +
>>  kvm-intel-y		+= vmx.o pmu_intel.o
>>  kvm-amd-y		+= svm.o pmu_amd.o
>>  
>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
>> index f780718..a74371f 100644
>> --- a/include/linux/gfp.h
>> +++ b/include/linux/gfp.h
>> @@ -452,6 +452,13 @@ static inline struct zonelist *node_zonelist(int nid, gfp_t flags)
>>  	return NODE_DATA(nid)->node_zonelists + gfp_zonelist(flags);
>>  }
>>  
>> +#ifdef	CONFIG_KVM_FREE_PAGE_HINTING
>> +#define HAVE_ARCH_ALLOC_PAGE
>> +#define HAVE_ARCH_FREE_PAGE
>> +void arch_free_page(struct page *page, int order);
>> +void arch_alloc_page(struct page *page, int order);
>> +#endif
>> +
>>  #ifndef HAVE_ARCH_FREE_PAGE
>>  static inline void arch_free_page(struct page *page, int order) { }
>>  #endif
>
> So the idea is to send pages to host in arch_free_page?
>
> However, I notice:
>
>         arch_free_page(page, order);
>         kernel_poison_pages(page, 1 << order, 0);
>
> Thus pages are immediately written to after they are freed.
>
>
>
>
>
>> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
>> index b0cc1a3..936c71d 100644
>> --- a/virt/kvm/Kconfig
>> +++ b/virt/kvm/Kconfig
>> @@ -50,3 +50,7 @@ config KVM_COMPAT
>>  
>>  config HAVE_KVM_IRQ_BYPASS
>>         bool
>> +
>> +config KVM_FREE_PAGE_HINTING
>> +       def_bool y
>> +       depends on KVM
>> diff --git a/virt/kvm/page_hinting.c b/virt/kvm/page_hinting.c
>> new file mode 100644
>> index 0000000..39d2b1d
>> --- /dev/null
>> +++ b/virt/kvm/page_hinting.c
>> @@ -0,0 +1,46 @@
>> +#include <linux/gfp.h>
>> +#include <linux/mm.h>
>> +#include <linux/page_ref.h>
>> +#include <linux/kvm_host.h>
>> +#include <linux/sort.h>
>> +
>> +#include <trace/events/kvm.h>
>> +
>> +#define MAX_FGPT_ENTRIES	1000
>> +#define HYPERLIST_THRESHOLD	500
>> +/*
>> + * struct kvm_free_pages - Tracks the pages which are freed by the guest.
>> + * @pfn	- page frame number for the page which is to be freed
>> + * @pages - number of pages which are supposed to be freed.
>> + * A global array object is used to hold the list of pfn and number of pages
>> + * which are freed by the guest. This list may also have fragmentated pages so
>> + * defragmentation is a must prior to the hypercall.
>> + */
>> +struct kvm_free_pages {
>> +	unsigned long pfn;
>> +	unsigned int pages;
>> +};
>> +
>> +/*
>> + * hypervisor_pages - It is a dummy structure passed with the hypercall.
>> + * @pfn - page frame number for the page which is to be freed.
>> + * @pages - number of pages which are supposed to be freed.
>> + * A global array object is used to to hold the list of pfn and pages and is
>> + * passed as part of the hypercall.
>> + */
>> +struct hypervisor_pages {
>> +	unsigned long pfn;
>> +	unsigned int pages;
>> +};
>> +
>> +DEFINE_PER_CPU(struct kvm_free_pages [MAX_FGPT_ENTRIES], kvm_pt);
>> +DEFINE_PER_CPU(int, kvm_pt_idx);
>> +struct hypervisor_pages hypervisor_pagelist[MAX_FGPT_ENTRIES];
>> +
>> +void arch_alloc_page(struct page *page, int order)
>> +{
>> +}
>> +
>> +void arch_free_page(struct page *page, int order)
>> +{
>> +}
>> -- 
>> 2.9.4
Thanks Michael for pointing out this bug.
The issue here is when both 'Guest Page Hinting' and 'Page Poisoning'
are enabled as it may result in guest generating memory corruption
errors when the hypervisor maps a different physical memory to the guest
after freeing the earlier mapped memory.

Page Poisoning is a feature in which the page is filled with a specific
pattern of (0x00 or 0xaa) after arch_free_page and the same is verified
before arch_alloc_page to prevent following issues:

    *information leak from the freed data
    *use after free bugs
    *memory corruption

-Selection of the pattern depends on the CONFIG_PAGE_POISONING_ZERO

Guest page hinting is a support for handing freed memory between the
guest and the host. It enables the guests with DAX (no page cache) to
rapidly free and reclaims memory to and from the host respectively. To
achieve this task it uses two lists one CPU-local and other CPU-global. 
Whenever a page is freed it is added to the respective CPU-local list
through arch_free_page() until it is full. Once the list is full a
seqlock is taken to prevent any further page allocations and the per
CPU-local list is traversed and only those pages which are not
reallocated by the guest are added to the CPU-global list. This global
list is then sent to the hypervisor/host.

After freeing the pages in the global list following things may happen:

    *Hypervisor reallocates the freed memory back to the guest if required
    *Hypervisor frees the memory and maps a different physical memory at
its place

In order to prevent any information leak hypervisor before allocating
memory to the guest fills it with zeroes.
The issue arises when the pattern used for Page Poisoning is 0xaa while
the newly allocated page received from the hypervisor by the guest is
filled with the pattern 0x00. This will result in memory corruption errors.
As per the current implementation in order to enable page poisoning a
boot time parameter (page_poison) is set. Hence  there could be two
following ways by which we could resolve the above-stated issue:

1. We can disable page poisoning if guest page hinting is enabled. The
only code change required in this case will be to check if
CONFIG_PAGE_POISONING is enabled, if so disable it so that poisoning is
not checked once we receive newly allocated memory from the hypervisor.

2. We can have a flag in the page structure using which we can control
poisoning on a per page basis. Using this we can disable the poisoning
for only those pages which are sent to the hypervisor so that posioning
is not checked on the newly allocated memory which is received by the
guest form the hypervisor. Though in order to incorporate this I may
have to make changes in page_poison.c, mm_types.h and other related files.

I am looking forward to the comments and suggestions.

[1] http://www.spinics.net/lists/kvm/msg153666.html
[2] https://www.spinics.net/lists/kvm/msg158054.html

-- 
Regards
Nitesh

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux