On Mon, Jun 02, 2014 at 09:44:33PM +0200, Daniel Kiper wrote: > From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> > > When the Xen hypervisor boots a PV kernel it hands it two pieces > of information: nr_pages and a made up E820 entry. > > The nr_pages value defines the range from zero to nr_pages of PFNs > which have a valid Machine Frame Number (MFN) underneath it. The > E820 mirrors that (with the VGA hole): > BIOS-provided physical RAM map: > Xen: 0000000000000000 - 00000000000a0000 (usable) > Xen: 00000000000a0000 - 0000000000100000 (reserved) > Xen: 0000000000100000 - 0000000080800000 (usable) > > The fun comes when a PV guest that is run with a machine E820 - that > can either be the initial domain or a PCI PV guest, where the E820 > looks like the normal thing: > > BIOS-provided physical RAM map: > Xen: 0000000000000000 - 000000000009e000 (usable) > Xen: 000000000009ec00 - 0000000000100000 (reserved) > Xen: 0000000000100000 - 0000000020000000 (usable) > Xen: 0000000020000000 - 0000000020200000 (reserved) > Xen: 0000000020200000 - 0000000040000000 (usable) > Xen: 0000000040000000 - 0000000040200000 (reserved) > Xen: 0000000040200000 - 00000000bad80000 (usable) > Xen: 00000000bad80000 - 00000000badc9000 (ACPI NVS) > .. > With that overlaying the nr_pages directly on the E820 does not > work as there are gaps and non-RAM regions that won't be used > by the memory allocator. The 'xen_release_chunk' helps with that > by punching holes in the P2M (PFN to MFN lookup tree) for those > regions and tells us that: > > Freeing 20000-20200 pfn range: 512 pages freed > Freeing 40000-40200 pfn range: 512 pages freed > Freeing bad80-badf4 pfn range: 116 pages freed > Freeing badf6-bae7f pfn range: 137 pages freed > Freeing bb000-100000 pfn range: 282624 pages freed > Released 283999 pages of unused memory > > Those 283999 pages are subtracted from the nr_pages and are returned > to the hypervisor. The end result is that the initial domain > boots with 1GB less memory as the nr_pages has been subtracted by > the amount of pages residing within the PCI hole. It can balloon up > to that if desired using 'xl mem-set 0 8092', but the balloon driver > is not always compiled in for the initial domain. > > This patch, implements the populate hypercall (XENMEM_populate_physmap) > which increases the the domain with the same amount of pages that > were released. > > The other solution (that did not work) was to transplant the MFN in > the P2M tree - the ones that were going to be freed were put in > the E820_RAM regions past the nr_pages. But the modifications to the > M2P array (the other side of creating PTEs) were not carried away. > As the hypervisor is the only one capable of modifying that and the > only two hypercalls that would do this are: the update_va_mapping > (which won't work, as during initial bootup only PFNs up to nr_pages > are mapped in the guest) or via the populate hypercall. > > The end result is that the kernel can now boot with the > nr_pages without having to subtract the 283999 pages. > > On a 8GB machine, with various dom0_mem= parameters this is what we get: > > no dom0_mem > -Memory: 6485264k/9435136k available (5817k kernel code, 1136060k absent, 1813812k reserved, 2899k data, 696k init) > +Memory: 7619036k/9435136k available (5817k kernel code, 1136060k absent, 680040k reserved, 2899k data, 696k init) > > dom0_mem=3G > -Memory: 2616536k/9435136k available (5817k kernel code, 1136060k absent, 5682540k reserved, 2899k data, 696k init) > +Memory: 2703776k/9435136k available (5817k kernel code, 1136060k absent, 5595300k reserved, 2899k data, 696k init) > > dom0_mem=max:3G > -Memory: 2696732k/4281724k available (5817k kernel code, 1136060k absent, 448932k reserved, 2899k data, 696k init) > +Memory: 2702204k/4281724k available (5817k kernel code, 1136060k absent, 443460k reserved, 2899k data, 696k init) > > And the 'xm list' or 'xl list' now reflect what the dom0_mem= > argument is. > > Acked-by: David Vrabel <david.vrabel@xxxxxxxxxx> > [v2: Use populate hypercall] > [v3: Remove debug printks] > [v4: Simplify code] > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> > (cherry picked from commit 2e2fb75475c2fc74c98100f1468c8195fee49f3b) > > Signed-off-by: Daniel Kiper <daniel.kiper@xxxxxxxxxx> > Tested-by: Daniel Kiper <daniel.kiper@xxxxxxxxxx> > Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> > --- > arch/x86/xen/setup.c | 116 ++++++++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 112 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c > index f8b0260..a2fae3e 100644 > --- a/arch/x86/xen/setup.c > +++ b/arch/x86/xen/setup.c > @@ -27,7 +27,6 @@ > #include <xen/interface/memory.h> > #include <xen/interface/physdev.h> > #include <xen/features.h> > - > #include "xen-ops.h" > #include "vdso.h" > > @@ -127,7 +126,105 @@ static unsigned long __init xen_release_chunk(unsigned long start, > > return len; > } > +static unsigned long __init xen_populate_physmap(unsigned long start, > + unsigned long end) > +{ > + struct xen_memory_reservation reservation = { > + .address_bits = 0, > + .extent_order = 0, > + .domid = DOMID_SELF > + }; > + unsigned long len = 0; > + int ret; > + > + for (pfn = start; pfn < end; pfn++) { This line breaks the build, why? odd... -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html