On Tue, Jul 09, 2013 at 02:38:53PM +0100, David Vrabel wrote: > From: David Vrabel <david.vrabel@xxxxxxxxxx> > > If there are UNUSABLE regions in the machine memory map, dom0 will > attempt to map them 1:1 which is not permitted by Xen and the kernel > will crash. > > There isn't anything interesting in the UNUSABLE region that the dom0 > kernel needs access to so we can avoid making the 1:1 mapping and > leave the region as RAM. > > Since the obtaining the memory map for dom0 and domU are now more > different, refactor each into separate functions. > > This fixes a dom0 boot failure if tboot is used (because tboot was > marking a memory region as UNUSABLE). Please also include the serial log that shows the crash. > > Signed-off-by: David Vrabel <david.vrabel@xxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > --- > arch/x86/xen/setup.c | 114 ++++++++++++++++++++++++++++++++++++++++---------- > 1 files changed, 91 insertions(+), 23 deletions(-) > > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c > index 94eac5c..3fdb6bd 100644 > --- a/arch/x86/xen/setup.c > +++ b/arch/x86/xen/setup.c > @@ -55,6 +55,91 @@ unsigned long xen_released_pages; > */ > #define EXTRA_MEM_RATIO (10) > > +static int __init xen_get_memory_map_dom0(struct e820entry *map, > + unsigned *nr_entries) > +{ > + struct xen_memory_map memmap; > + unsigned i; > + int ret; > + > + /* > + * Dom0 requires access to machine addresses for BIOS data and > + * MMIO (e.g. PCI) devices. The reset of the kernel expects > + * to be able to access these through a 1:1 p2m mapping. > + * > + * We need to take the pseudo physical memory map and set up > + * 1:1 mappings corresponding to the RESERVED regions and > + * holes in the /machine/ memory map, adding/expanding the RAM > + * region at the end of the map for the relocated RAM. > + * > + * This is more easily done if we just start with the machine > + * memory map. > + * > + * UNUSABLE regions are awkward, they are not interesting to > + * dom0 and Xen won't allow them to be mapped so we want to > + * leave these as RAM in the pseudo physical map. We just want them as such in the P2M but not do any PTE creation for it? Why do we care about it? We are not creating any page tables for E820_UNUSABLE regions. > + * > + * Again, this is easiest if we take the machine memory map > + * and change the UNUSABLE regions to RAM. Won't then Linux try to map them then? In 3.9 (or was it 3.8?) and further the page table creation starts ignoring any E820 entries that are not RAM. See init_range_memory_mapping and its comment: /* * We need to iterate through the E820 memory map and create direct mappings * for only E820_RAM and E820_KERN_RESERVED regions. We cannot simply * create direct mappings for all pfns from [0 to max_low_pfn) and * [4GB to max_pfn) because of possible memory holes in high addresses * that cannot be marked as UC by fixed/variable range MTRRs. * Depending on the alignment of E820 ranges, this may possibly result * in using smaller size (i.e. 4K instead of 2M or 1G) page tables. * So in effect you are now altering them. > + */ > + > + memmap.nr_entries = *nr_entries; > + set_xen_guest_handle(memmap.buffer, map); > + > + ret = HYPERVISOR_memory_op(XENMEM_machine_memory_map, &memmap); > + if (ret < 0) > + return ret; > + > + for (i = 0; i < memmap.nr_entries; i++) { > + if (map[i].type == E820_UNUSABLE) What if the E820_UNUSABLE regions were manufactured by the BIOS? Or somebody booted Xen with mem=3G (in which we clip the memory) on a 16GB box. > + map[i].type = E820_RAM; > + } > + > + *nr_entries = memmap.nr_entries; > + return 0; > +} > + > +static int __init xen_get_memory_map_domu(struct e820entry *map, > + unsigned *nr_entries) > +{ > + struct xen_memory_map memmap; > + int ret; > + > + /* > + * For domUs, use the psuedo-physical memory map. > + * > + * If this is not available, fake up a memory map with a > + * single region containing the initial number of pages. > + */ > + > + memmap.nr_entries = *nr_entries; > + set_xen_guest_handle(memmap.buffer, map); > + > + ret = HYPERVISOR_memory_op(XENMEM_memory_map, &memmap); > + if (ret == -ENOSYS) { > + unsigned long max_pfn = min(MAX_DOMAIN_PAGES, > + xen_start_info->nr_pages); > + memmap.nr_entries = 1; > + map[0].addr = 0ULL; > + map[0].size = PFN_PHYS(max_pfn); > + /* 8MB slack (to balance backend allocations). */ > + map[0].size += 8ULL << 20; > + map[0].type = E820_RAM; > + } else if (ret < 0) > + return ret; > + > + *nr_entries = memmap.nr_entries; > + return 0; > +} > + > +static int __init xen_get_memory_map(struct e820entry *map, unsigned *nr_entries) > +{ > + if (xen_initial_domain()) > + return xen_get_memory_map_dom0(map, nr_entries); > + else > + return xen_get_memory_map_domu(map, nr_entries); > +} > + > static void __init xen_add_extra_mem(u64 start, u64 size) > { > unsigned long pfn; > @@ -319,42 +404,25 @@ static void xen_align_and_add_e820_region(u64 start, u64 size, int type) > char * __init xen_memory_setup(void) > { > static struct e820entry map[E820MAX] __initdata; > + unsigned map_nr_entries = E820MAX; > > unsigned long max_pfn = xen_start_info->nr_pages; > unsigned long long mem_end; > int rc; > - struct xen_memory_map memmap; > unsigned long max_pages; > unsigned long last_pfn = 0; > unsigned long extra_pages = 0; > unsigned long populated; > int i; > - int op; > > max_pfn = min(MAX_DOMAIN_PAGES, max_pfn); > mem_end = PFN_PHYS(max_pfn); > > - memmap.nr_entries = E820MAX; > - set_xen_guest_handle(memmap.buffer, map); > - > - op = xen_initial_domain() ? > - XENMEM_machine_memory_map : > - XENMEM_memory_map; > - rc = HYPERVISOR_memory_op(op, &memmap); > - if (rc == -ENOSYS) { > - BUG_ON(xen_initial_domain()); > - memmap.nr_entries = 1; > - map[0].addr = 0ULL; > - map[0].size = mem_end; > - /* 8MB slack (to balance backend allocations). */ > - map[0].size += 8ULL << 20; > - map[0].type = E820_RAM; > - rc = 0; > - } > + rc = xen_get_memory_map(map, &map_nr_entries); > BUG_ON(rc); > > /* Make sure the Xen-supplied memory map is well-ordered. */ > - sanitize_e820_map(map, memmap.nr_entries, &memmap.nr_entries); > + sanitize_e820_map(map, map_nr_entries, &map_nr_entries); > > max_pages = xen_get_max_pages(); > if (max_pages > max_pfn) > @@ -366,12 +434,12 @@ char * __init xen_memory_setup(void) > * this are first released. > */ > xen_released_pages = xen_set_identity_and_release( > - map, memmap.nr_entries, max_pfn); > + map, map_nr_entries, max_pfn); > > /* > * Populate back the non-RAM pages and E820 gaps that had been > * released. */ > - populated = xen_populate_chunk(map, memmap.nr_entries, > + populated = xen_populate_chunk(map, map_nr_entries, > max_pfn, &last_pfn, xen_released_pages); > > xen_released_pages -= populated; > @@ -395,7 +463,7 @@ char * __init xen_memory_setup(void) > extra_pages = min(EXTRA_MEM_RATIO * min(max_pfn, PFN_DOWN(MAXMEM)), > extra_pages); > i = 0; > - while (i < memmap.nr_entries) { > + while (i < map_nr_entries) { > u64 addr = map[i].addr; > u64 size = map[i].size; > u32 type = map[i].type; > -- > 1.7.2.5 > -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html