Note: I cc:'d stable in the email headers by mistake. NO CC: stable tag, I don't want this to go into stable. Thanks, --> Steve On Thu, Mar 28, 2024 at 11:06:14AM -0500, Steve Wahl wrote: > When ident_pud_init() uses only gbpages to create identity maps, large > ranges of addresses not actually requested can be included in the > resulting table; a 4K request will map a full GB. On UV systems, this > ends up including regions that will cause hardware to halt the system > if accessed (these are marked "reserved" by BIOS). Even processor > speculation into these regions is enough to trigger the system halt. > And MTRRs cannot be used to restrict this speculation, there are not > enough MTRRs to cover all the reserved regions. > > The fix for that would be to only use gbpages when map creation > requests include the full GB page of space, and falling back to using > smaller 2M pages when only portions of a GB page are included in the > request. > > But on some other systems, possibly due to buggy bios, that solution > leaves some areas out of the identity map that are needed for kexec to > succeed. It is believed that these areas are not marked properly for > map_acpi_tables() in arch/x86/kernel/machine_kexec_64.c to catch and > map them. The nogbpages kernel command line option also causes these > systems to fail even without these changes. > > So, create kexec identity maps using full GB pages on all platforms > but UV; on UV, use narrower 2MB pages in the identity map where a full > GB page would include areas outside the region requested. > > No attempt is made to coalesce mapping requests. If a request requires > a map entry at the 2M (pmd) level, subsequent mapping requests within > the same 1G region will also be at the pmd level, even if adjacent or > overlapping such requests could have been combined to map a full > gbpage. Existing usage starts with larger regions and then adds > smaller regions, so this should not have any great consequence. > > Signed-off-by: Steve Wahl <steve.wahl@xxxxxxx> > > Fixes: d794734c9bbf ("x86/mm/ident_map: Use gbpages only where full GB page should be mapped.") > Reported-by: Pavin Joseph <me@xxxxxxxxxxxxxxx> > Closes: https://lore.kernel.org/all/3a1b9909-45ac-4f97-ad68-d16ef1ce99db@xxxxxxxxxxxxxxx/ > Link: https://lore.kernel.org/all/20240322162135.3984233-1-steve.wahl@xxxxxxx/ > Tested-by: Pavin Joseph <me@xxxxxxxxxxxxxxx> > Tested-by: Eric Hagberg <ehagberg@xxxxxxxxx> > Tested-by: Sarah Brofeldt <srhb@xxxxxx> > --- > > v4: Incorporate fix for regression on systems relying on gbpages > mapping more than the ranges actually requested for successful > kexec, by limiting the effects of the change to UV systems. > This patch based on tip/x86/urgent. > > v3: per Dave Hansen review, re-arrange changelog info, > refactor code to use bool variable and split out conditions. > > v2: per Dave Hansen review: Additional changelog info, > moved pud_large() check earlier in the code, and > improved the comment describing the conditions > that restrict gbpage usage. > > > arch/x86/include/asm/init.h | 1 + > arch/x86/kernel/machine_kexec_64.c | 10 ++++++++++ > arch/x86/mm/ident_map.c | 24 +++++++++++++++++++----- > 3 files changed, 30 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h > index cc9ccf61b6bd..371d9faea8bc 100644 > --- a/arch/x86/include/asm/init.h > +++ b/arch/x86/include/asm/init.h > @@ -10,6 +10,7 @@ struct x86_mapping_info { > unsigned long page_flag; /* page flag for PMD or PUD entry */ > unsigned long offset; /* ident mapping offset */ > bool direct_gbpages; /* PUD level 1GB page support */ > + bool direct_gbpages_only; /* use 1GB pages exclusively */ > unsigned long kernpg_flag; /* kernel pagetable flag override */ > }; > > diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c > index b180d8e497c3..3a2f5d291a88 100644 > --- a/arch/x86/kernel/machine_kexec_64.c > +++ b/arch/x86/kernel/machine_kexec_64.c > @@ -28,6 +28,7 @@ > #include <asm/setup.h> > #include <asm/set_memory.h> > #include <asm/cpu.h> > +#include <asm/uv/uv.h> > > #ifdef CONFIG_ACPI > /* > @@ -212,6 +213,15 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable) > > if (direct_gbpages) > info.direct_gbpages = true; > + /* > + * UV systems need restrained use of gbpages in the identity > + * maps to avoid system halts. But some other systems rely on > + * using gbpages to expand mappings outside the regions > + * actually listed, to include areas required for kexec but > + * not explicitly named by the bios. > + */ > + if (!is_uv_system()) > + info.direct_gbpages_only = true; > > for (i = 0; i < nr_pfn_mapped; i++) { > mstart = pfn_mapped[i].start << PAGE_SHIFT; > diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c > index 968d7005f4a7..a538a54aba5d 100644 > --- a/arch/x86/mm/ident_map.c > +++ b/arch/x86/mm/ident_map.c > @@ -26,18 +26,32 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page, > for (; addr < end; addr = next) { > pud_t *pud = pud_page + pud_index(addr); > pmd_t *pmd; > + bool use_gbpage; > > next = (addr & PUD_MASK) + PUD_SIZE; > if (next > end) > next = end; > > - if (info->direct_gbpages) { > - pud_t pudval; > + /* if this is already a gbpage, this portion is already mapped */ > + if (pud_leaf(*pud)) > + continue; > + > + /* Is using a gbpage allowed? */ > + use_gbpage = info->direct_gbpages; > > - if (pud_present(*pud)) > - continue; > + if (!info->direct_gbpages_only) { > + /* Don't use gbpage if it maps more than the requested region. */ > + /* at the beginning: */ > + use_gbpage &= ((addr & ~PUD_MASK) == 0); > + /* ... or at the end: */ > + use_gbpage &= ((next & ~PUD_MASK) == 0); > + } > + /* Never overwrite existing mappings */ > + use_gbpage &= !pud_present(*pud); > + > + if (use_gbpage) { > + pud_t pudval; > > - addr &= PUD_MASK; > pudval = __pud((addr - info->offset) | info->page_flag); > set_pud(pud, pudval); > continue; > > base-commit: b6540de9b5c867b4c8bc31225db181cc017d8cc7 > -- > 2.26.2 > -- Steve Wahl, Hewlett Packard Enterprise