Re: [PATCH 3.4 05/11] xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 02, 2014 at 09:44:33PM +0200, Daniel Kiper wrote:
> From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> 
> When the Xen hypervisor boots a PV kernel it hands it two pieces
> of information: nr_pages and a made up E820 entry.
> 
> The nr_pages value defines the range from zero to nr_pages of PFNs
> which have a valid Machine Frame Number (MFN) underneath it. The
> E820 mirrors that (with the VGA hole):
> BIOS-provided physical RAM map:
>  Xen: 0000000000000000 - 00000000000a0000 (usable)
>  Xen: 00000000000a0000 - 0000000000100000 (reserved)
>  Xen: 0000000000100000 - 0000000080800000 (usable)
> 
> The fun comes when a PV guest that is run with a machine E820 - that
> can either be the initial domain or a PCI PV guest, where the E820
> looks like the normal thing:
> 
> BIOS-provided physical RAM map:
>  Xen: 0000000000000000 - 000000000009e000 (usable)
>  Xen: 000000000009ec00 - 0000000000100000 (reserved)
>  Xen: 0000000000100000 - 0000000020000000 (usable)
>  Xen: 0000000020000000 - 0000000020200000 (reserved)
>  Xen: 0000000020200000 - 0000000040000000 (usable)
>  Xen: 0000000040000000 - 0000000040200000 (reserved)
>  Xen: 0000000040200000 - 00000000bad80000 (usable)
>  Xen: 00000000bad80000 - 00000000badc9000 (ACPI NVS)
> ..
> With that overlaying the nr_pages directly on the E820 does not
> work as there are gaps and non-RAM regions that won't be used
> by the memory allocator. The 'xen_release_chunk' helps with that
> by punching holes in the P2M (PFN to MFN lookup tree) for those
> regions and tells us that:
> 
> Freeing  20000-20200 pfn range: 512 pages freed
> Freeing  40000-40200 pfn range: 512 pages freed
> Freeing  bad80-badf4 pfn range: 116 pages freed
> Freeing  badf6-bae7f pfn range: 137 pages freed
> Freeing  bb000-100000 pfn range: 282624 pages freed
> Released 283999 pages of unused memory
> 
> Those 283999 pages are subtracted from the nr_pages and are returned
> to the hypervisor. The end result is that the initial domain
> boots with 1GB less memory as the nr_pages has been subtracted by
> the amount of pages residing within the PCI hole. It can balloon up
> to that if desired using 'xl mem-set 0 8092', but the balloon driver
> is not always compiled in for the initial domain.
> 
> This patch, implements the populate hypercall (XENMEM_populate_physmap)
> which increases the the domain with the same amount of pages that
> were released.
> 
> The other solution (that did not work) was to transplant the MFN in
> the P2M tree - the ones that were going to be freed were put in
> the E820_RAM regions past the nr_pages. But the modifications to the
> M2P array (the other side of creating PTEs) were not carried away.
> As the hypervisor is the only one capable of modifying that and the
> only two hypercalls that would do this are: the update_va_mapping
> (which won't work, as during initial bootup only PFNs up to nr_pages
> are mapped in the guest) or via the populate hypercall.
> 
> The end result is that the kernel can now boot with the
> nr_pages without having to subtract the 283999 pages.
> 
> On a 8GB machine, with various dom0_mem= parameters this is what we get:
> 
> no dom0_mem
> -Memory: 6485264k/9435136k available (5817k kernel code, 1136060k absent, 1813812k reserved, 2899k data, 696k init)
> +Memory: 7619036k/9435136k available (5817k kernel code, 1136060k absent, 680040k reserved, 2899k data, 696k init)
> 
> dom0_mem=3G
> -Memory: 2616536k/9435136k available (5817k kernel code, 1136060k absent, 5682540k reserved, 2899k data, 696k init)
> +Memory: 2703776k/9435136k available (5817k kernel code, 1136060k absent, 5595300k reserved, 2899k data, 696k init)
> 
> dom0_mem=max:3G
> -Memory: 2696732k/4281724k available (5817k kernel code, 1136060k absent, 448932k reserved, 2899k data, 696k init)
> +Memory: 2702204k/4281724k available (5817k kernel code, 1136060k absent, 443460k reserved, 2899k data, 696k init)
> 
> And the 'xm list' or 'xl list' now reflect what the dom0_mem=
> argument is.
> 
> Acked-by: David Vrabel <david.vrabel@xxxxxxxxxx>
> [v2: Use populate hypercall]
> [v3: Remove debug printks]
> [v4: Simplify code]
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> (cherry picked from commit 2e2fb75475c2fc74c98100f1468c8195fee49f3b)
> 
> Signed-off-by: Daniel Kiper <daniel.kiper@xxxxxxxxxx>
> Tested-by: Daniel Kiper <daniel.kiper@xxxxxxxxxx>
> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> ---
>  arch/x86/xen/setup.c |  116 ++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 112 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index f8b0260..a2fae3e 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -27,7 +27,6 @@
>  #include <xen/interface/memory.h>
>  #include <xen/interface/physdev.h>
>  #include <xen/features.h>
> -
>  #include "xen-ops.h"
>  #include "vdso.h"
>  
> @@ -127,7 +126,105 @@ static unsigned long __init xen_release_chunk(unsigned long start,
>  
>  	return len;
>  }
> +static unsigned long __init xen_populate_physmap(unsigned long start,
> +						 unsigned long end)
> +{
> +	struct xen_memory_reservation reservation = {
> +		.address_bits = 0,
> +		.extent_order = 0,
> +		.domid        = DOMID_SELF
> +	};
> +	unsigned long len = 0;
> +	int ret;
> +
> +	for (pfn = start; pfn < end; pfn++) {

This line breaks the build, why?

odd...

--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]