On Fri, 13 Sep 2019 at 17:22, Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > On Fri, Sep 13, 2019 at 6:00 AM Ard Biesheuvel > <ard.biesheuvel@xxxxxxxxxx> wrote: > > > > On Fri, 30 Aug 2019 at 03:06, Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > > > > > UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the > > > interpretation of the EFI Memory Types as "reserved for a specific > > > purpose". > > > > > > The proposed Linux behavior for specific purpose memory is that it is > > > reserved for direct-access (device-dax) by default and not available for > > > any kernel usage, not even as an OOM fallback. Later, through udev > > > scripts or another init mechanism, these device-dax claimed ranges can > > > be reconfigured and hot-added to the available System-RAM with a unique > > > node identifier. This device-dax management scheme implements "soft" in > > > the "soft reserved" designation by allowing some or all of the > > > reservation to be recovered as typical memory. This policy can be > > > disabled at compile-time with CONFIG_EFI_SOFT_RESERVE=n, or runtime with > > > efi=nosoftreserve. > > > > > > This patch introduces 2 new concepts at once given the entanglement > > > between early boot enumeration relative to memory that can optionally be > > > reserved from the kernel page allocator by default. The new concepts > > > are: > > > > > > - E820_TYPE_SOFT_RESERVED: Upon detecting the EFI_MEMORY_SP > > > attribute on EFI_CONVENTIONAL memory, update the E820 map with this > > > new type. Only perform this classification if the > > > CONFIG_EFI_SOFT_RESERVE=y policy is enabled, otherwise treat it as > > > typical ram. > > > > > > - IORES_DESC_SOFT_RESERVED: Add a new I/O resource descriptor for > > > a device driver to search iomem resources for application specific > > > memory. Teach the iomem code to identify such ranges as "Soft Reserved". > > > > > > A follow-on change integrates parsing of the ACPI HMAT to identify the > > > node and sub-range boundaries of EFI_MEMORY_SP designated memory. For > > > now, just identify and reserve memory of this type. > > > > > > The translation of EFI_CONVENTIONAL_MEMORY + EFI_MEMORY_SP to "soft > > > reserved" is x86/E820-only, but other archs could choose to publish > > > IORES_DESC_SOFT_RESERVED resources from their platform-firmware memory > > > map handlers. Other EFI-capable platforms would need to go audit their > > > local usages of EFI_CONVENTIONAL_MEMORY to consider the soft reserved > > > case. > > > > > > Cc: <x86@xxxxxxxxxx> > > > Cc: Borislav Petkov <bp@xxxxxxxxx> > > > Cc: Ingo Molnar <mingo@xxxxxxxxxx> > > > Cc: "H. Peter Anvin" <hpa@xxxxxxxxx> > > > Cc: Darren Hart <dvhart@xxxxxxxxxxxxx> > > > Cc: Andy Shevchenko <andy@xxxxxxxxxxxxx> > > > Cc: Andy Lutomirski <luto@xxxxxxxxxx> > > > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > > > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > > > Cc: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> > > > Reported-by: kbuild test robot <lkp@xxxxxxxxx> > > > Reviewed-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> > > > Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> > > > > Hi Dan, > > > > I understand that non-x86 may be out of scope for you, but this patch > > makes changes to x86 and generic code at the same time without regard > > for other architectures. > > Yes, that did give me pause. > > > I'd prefer it if we could cover ARM cleanly as well right at the start. > > Let's do it. > > > > > The first step would be to split out the EFI stub changes (i.e., to > > avoid allocating memory from EFI_MEMORY_SP regions) and the EFI core > > changes from the other changes. Then, I would like to ask for your > > help to get the arm64 part implemented where EFI_MEMORY_SP memory gets > > registered/reserved in a way that allows the HMAT code (which should > > be arch agnostic) to operate in the same way as it does on x86. Would > > it be enough to simply memblock_reserve() it and insert the iomem > > resource with the soft_reserved attribute? > > > > Some more comments below. > > > > > --- > > > Documentation/admin-guide/kernel-parameters.txt | 19 +++++++-- > > > arch/x86/Kconfig | 21 +++++++++ > > > arch/x86/boot/compressed/eboot.c | 7 +++ > > > arch/x86/boot/compressed/kaslr.c | 4 ++ > > > arch/x86/include/asm/e820/types.h | 8 ++++ > > > arch/x86/include/asm/efi-stub.h | 11 +++++ > > > arch/x86/kernel/e820.c | 12 +++++ > > > arch/x86/platform/efi/efi.c | 51 +++++++++++++++++++++-- > > > drivers/firmware/efi/efi.c | 3 + > > > drivers/firmware/efi/libstub/efi-stub-helper.c | 12 +++++ > > > include/linux/efi.h | 1 > > > include/linux/ioport.h | 1 > > > 12 files changed, 139 insertions(+), 11 deletions(-) > > > create mode 100644 arch/x86/include/asm/efi-stub.h > > > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > > > index 1c67acd1df65..dd28f0726309 100644 > > > --- a/Documentation/admin-guide/kernel-parameters.txt > > > +++ b/Documentation/admin-guide/kernel-parameters.txt > > > @@ -1152,7 +1152,8 @@ > > > Format: {"off" | "on" | "skip[mbr]"} > > > > > > efi= [EFI] > > > - Format: { "old_map", "nochunk", "noruntime", "debug" } > > > + Format: { "old_map", "nochunk", "noruntime", "debug", > > > + "nosoftreserve" } > > > old_map [X86-64]: switch to the old ioremap-based EFI > > > runtime services mapping. 32-bit still uses this one by > > > default. > > > @@ -1161,6 +1162,12 @@ > > > firmware implementations. > > > noruntime : disable EFI runtime services support > > > debug: enable misc debug output > > > + nosoftreserve: The EFI_MEMORY_SP (Specific Purpose) > > > + attribute may cause the kernel to reserve the > > > + memory range for a memory mapping driver to > > > + claim. Specify efi=nosoftreserve to disable this > > > + reservation and treat the memory by its base type > > > + (i.e. EFI_CONVENTIONAL_MEMORY / "System RAM"). > > > > > > efi_no_storage_paranoia [EFI; X86] > > > Using this parameter you can use more than 50% of > > > @@ -1173,15 +1180,21 @@ > > > updating original EFI memory map. > > > Region of memory which aa attribute is added to is > > > from ss to ss+nn. > > > + > > > If efi_fake_mem=2G@4G:0x10000,2G@0x10a0000000:0x10000 > > > is specified, EFI_MEMORY_MORE_RELIABLE(0x10000) > > > attribute is added to range 0x100000000-0x180000000 and > > > 0x10a0000000-0x1120000000. > > > > > > + If efi_fake_mem=8G@9G:0x40000 is specified, the > > > + EFI_MEMORY_SP(0x40000) attribute is added to > > > + range 0x240000000-0x43fffffff. > > > + > > > Using this parameter you can do debugging of EFI memmap > > > - related feature. For example, you can do debugging of > > > + related features. For example, you can do debugging of > > > Address Range Mirroring feature even if your box > > > - doesn't support it. > > > + doesn't support it, or mark specific memory as > > > + "soft reserved". > > > > > > efivar_ssdt= [EFI; X86] Name of an EFI variable that contains an SSDT > > > that is to be dynamically loaded by Linux. If there are > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > > > index 4195f44c6a09..bced13503bb1 100644 > > > --- a/arch/x86/Kconfig > > > +++ b/arch/x86/Kconfig > > > @@ -1981,6 +1981,27 @@ config EFI_MIXED > > > > > > If unsure, say N. > > > > > > +config EFI_SOFT_RESERVE > > > + bool "Reserve EFI Specific Purpose Memory" > > > + depends on EFI && ACPI_HMAT > > > + default ACPI_HMAT > > > + ---help--- > > > + On systems that have mixed performance classes of memory EFI > > > + may indicate specific purpose memory with an attribute (See > > > + EFI_MEMORY_SP in UEFI 2.8). A memory range tagged with this > > > + attribute may have unique performance characteristics compared > > > + to the system's general purpose "System RAM" pool. On the > > > + expectation that such memory has application specific usage, > > > + and its base EFI memory type is "conventional" answer Y to > > > + arrange for the kernel to reserve it as a "Soft Reserved" > > > + resource, and set aside for direct-access (device-dax) by > > > + default. The memory range can later be optionally assigned to > > > + the page allocator by system administrator policy via the > > > + device-dax kmem facility. Say N to have the kernel treat this > > > + memory as "System RAM" by default. > > > + > > > + If unsure, say Y. > > > + > > > > This should be in generic code. > > Agree. > > > > > > config SECCOMP > > > def_bool y > > > prompt "Enable seccomp to safely compute untrusted bytecode" > > > diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c > > > index d6662fdef300..f2dc5896d770 100644 > > > --- a/arch/x86/boot/compressed/eboot.c > > > +++ b/arch/x86/boot/compressed/eboot.c > > > @@ -10,6 +10,7 @@ > > > #include <linux/pci.h> > > > > > > #include <asm/efi.h> > > > +#include <asm/efi-stub.h> > > > #include <asm/e820/types.h> > > > #include <asm/setup.h> > > > #include <asm/desc.h> > > > @@ -553,7 +554,11 @@ setup_e820(struct boot_params *params, struct setup_data *e820ext, u32 e820ext_s > > > case EFI_BOOT_SERVICES_CODE: > > > case EFI_BOOT_SERVICES_DATA: > > > case EFI_CONVENTIONAL_MEMORY: > > > - e820_type = E820_TYPE_RAM; > > > + if (!efi_nosoftreserve > > > + && (d->attribute & EFI_MEMORY_SP)) > > > + e820_type = E820_TYPE_SOFT_RESERVED; > > > + else > > > + e820_type = E820_TYPE_RAM; > > > break; > > > > > > case EFI_ACPI_MEMORY_NVS: > > > diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c > > > index 2e53c056ba20..093e84e28b7a 100644 > > > --- a/arch/x86/boot/compressed/kaslr.c > > > +++ b/arch/x86/boot/compressed/kaslr.c > > > @@ -38,6 +38,7 @@ > > > #include <linux/efi.h> > > > #include <generated/utsrelease.h> > > > #include <asm/efi.h> > > > +#include <asm/efi-stub.h> > > > > > > /* Macros used by the included decompressor code below. */ > > > #define STATIC > > > @@ -760,6 +761,9 @@ process_efi_entries(unsigned long minimum, unsigned long image_size) > > > if (md->type != EFI_CONVENTIONAL_MEMORY) > > > continue; > > > > > > + if (!efi_nosoftreserve && (md->attribute & EFI_MEMORY_SP)) > > > + continue; > > > + > > > if (efi_mirror_found && > > > !(md->attribute & EFI_MEMORY_MORE_RELIABLE)) > > > continue; > > > diff --git a/arch/x86/include/asm/e820/types.h b/arch/x86/include/asm/e820/types.h > > > index c3aa4b5e49e2..314f75d886d0 100644 > > > --- a/arch/x86/include/asm/e820/types.h > > > +++ b/arch/x86/include/asm/e820/types.h > > > @@ -28,6 +28,14 @@ enum e820_type { > > > */ > > > E820_TYPE_PRAM = 12, > > > > > > + /* > > > + * Special-purpose memory is indicated to the system via the > > > + * EFI_MEMORY_SP attribute. Define an e820 translation of this > > > + * memory type for the purpose of reserving this range and > > > + * marking it with the IORES_DESC_SOFT_RESERVED designation. > > > + */ > > > + E820_TYPE_SOFT_RESERVED = 0xefffffff, > > > + > > > /* > > > * Reserved RAM used by the kernel itself if > > > * CONFIG_INTEL_TXT=y is enabled, memory of this type > > > diff --git a/arch/x86/include/asm/efi-stub.h b/arch/x86/include/asm/efi-stub.h > > > new file mode 100644 > > > index 000000000000..16ebd036387b > > > --- /dev/null > > > +++ b/arch/x86/include/asm/efi-stub.h > > > @@ -0,0 +1,11 @@ > > > +// SPDX-License-Identifier: GPL-2.0 > > > +#ifndef _X86_EFI_STUB_H_ > > > +#define _X86_EFI_STUB_H_ > > > + > > > +#ifdef CONFIG_EFI_STUB > > > +extern bool efi_nosoftreserve; > > > +#else > > > +#define efi_nosoftreserve (1) > > > +#endif > > > + > > > +#endif /* _X86_EFI_STUB_H_ */ > > > > Please put this in generic code as well (but you need a function not a > > variable - see below) > > > > > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c > > > index 7da2bcd2b8eb..9976106b57ec 100644 > > > --- a/arch/x86/kernel/e820.c > > > +++ b/arch/x86/kernel/e820.c > > > @@ -190,6 +190,7 @@ static void __init e820_print_type(enum e820_type type) > > > case E820_TYPE_RAM: /* Fall through: */ > > > case E820_TYPE_RESERVED_KERN: pr_cont("usable"); break; > > > case E820_TYPE_RESERVED: pr_cont("reserved"); break; > > > + case E820_TYPE_SOFT_RESERVED: pr_cont("soft reserved"); break; > > > case E820_TYPE_ACPI: pr_cont("ACPI data"); break; > > > case E820_TYPE_NVS: pr_cont("ACPI NVS"); break; > > > case E820_TYPE_UNUSABLE: pr_cont("unusable"); break; > > > @@ -1037,6 +1038,7 @@ static const char *__init e820_type_to_string(struct e820_entry *entry) > > > case E820_TYPE_PRAM: return "Persistent Memory (legacy)"; > > > case E820_TYPE_PMEM: return "Persistent Memory"; > > > case E820_TYPE_RESERVED: return "Reserved"; > > > + case E820_TYPE_SOFT_RESERVED: return "Soft Reserved"; > > > default: return "Unknown E820 type"; > > > } > > > } > > > @@ -1052,6 +1054,7 @@ static unsigned long __init e820_type_to_iomem_type(struct e820_entry *entry) > > > case E820_TYPE_PRAM: /* Fall-through: */ > > > case E820_TYPE_PMEM: /* Fall-through: */ > > > case E820_TYPE_RESERVED: /* Fall-through: */ > > > + case E820_TYPE_SOFT_RESERVED: /* Fall-through: */ > > > default: return IORESOURCE_MEM; > > > } > > > } > > > @@ -1064,6 +1067,7 @@ static unsigned long __init e820_type_to_iores_desc(struct e820_entry *entry) > > > case E820_TYPE_PMEM: return IORES_DESC_PERSISTENT_MEMORY; > > > case E820_TYPE_PRAM: return IORES_DESC_PERSISTENT_MEMORY_LEGACY; > > > case E820_TYPE_RESERVED: return IORES_DESC_RESERVED; > > > + case E820_TYPE_SOFT_RESERVED: return IORES_DESC_SOFT_RESERVED; > > > case E820_TYPE_RESERVED_KERN: /* Fall-through: */ > > > case E820_TYPE_RAM: /* Fall-through: */ > > > case E820_TYPE_UNUSABLE: /* Fall-through: */ > > > @@ -1078,11 +1082,12 @@ static bool __init do_mark_busy(enum e820_type type, struct resource *res) > > > return true; > > > > > > /* > > > - * Treat persistent memory like device memory, i.e. reserve it > > > - * for exclusive use of a driver > > > + * Treat persistent memory and other special memory ranges like > > > + * device memory, i.e. reserve it for exclusive use of a driver > > > */ > > > switch (type) { > > > case E820_TYPE_RESERVED: > > > + case E820_TYPE_SOFT_RESERVED: > > > case E820_TYPE_PRAM: > > > case E820_TYPE_PMEM: > > > return false; > > > @@ -1285,6 +1290,9 @@ void __init e820__memblock_setup(void) > > > if (end != (resource_size_t)end) > > > continue; > > > > > > + if (entry->type == E820_TYPE_SOFT_RESERVED) > > > + memblock_reserve(entry->addr, entry->size); > > > + > > > if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN) > > > continue; > > > > > > diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c > > > index 0bb58eb33ca0..9cfb7f1cf25d 100644 > > > --- a/arch/x86/platform/efi/efi.c > > > +++ b/arch/x86/platform/efi/efi.c > > > @@ -151,10 +151,18 @@ void __init efi_find_mirror(void) > > > * more than the max 128 entries that can fit in the e820 legacy > > > * (zeropage) memory map. > > > */ > > > +enum add_efi_mode { > > > + ADD_EFI_ALL, > > > + ADD_EFI_SOFT_RESERVED, > > > +}; > > > > > > -static void __init do_add_efi_memmap(void) > > > +static void __init do_add_efi_memmap(enum add_efi_mode mode) > > > { > > > efi_memory_desc_t *md; > > > + int add = 0; > > > + > > > + if (!efi_enabled(EFI_MEMMAP)) > > > + return; > > > > > > for_each_efi_memory_desc(md) { > > > unsigned long long start = md->phys_addr; > > > @@ -167,7 +175,10 @@ static void __init do_add_efi_memmap(void) > > > case EFI_BOOT_SERVICES_CODE: > > > case EFI_BOOT_SERVICES_DATA: > > > case EFI_CONVENTIONAL_MEMORY: > > > - if (md->attribute & EFI_MEMORY_WB) > > > + if (efi_enabled(EFI_MEM_SOFT_RESERVE) > > > + && (md->attribute & EFI_MEMORY_SP)) > > > + e820_type = E820_TYPE_SOFT_RESERVED; > > > + else if (md->attribute & EFI_MEMORY_WB) > > > e820_type = E820_TYPE_RAM; > > > else > > > e820_type = E820_TYPE_RESERVED; > > > @@ -193,9 +204,17 @@ static void __init do_add_efi_memmap(void) > > > e820_type = E820_TYPE_RESERVED; > > > break; > > > } > > > + > > > + if (e820_type == E820_TYPE_SOFT_RESERVED) > > > + /* always add E820_TYPE_SOFT_RESERVED */; > > > + else if (mode == ADD_EFI_SOFT_RESERVED) > > > + continue; > > > + > > > + add++; > > > e820__range_add(start, size, e820_type); > > > } > > > - e820__update_table(e820_table); > > > + if (add) > > > + e820__update_table(e820_table); > > > } > > > > > > int __init efi_memblock_x86_reserve_range(void) > > > @@ -227,8 +246,18 @@ int __init efi_memblock_x86_reserve_range(void) > > > if (rv) > > > return rv; > > > > > > - if (add_efi_memmap) > > > - do_add_efi_memmap(); > > > + if (add_efi_memmap) { > > > + do_add_efi_memmap(ADD_EFI_ALL); > > > + } else { > > > + /* > > > + * Given add_efi_memmap defaults to 0 and there there is no e820 > > > + * mechanism for soft-reserved memory. Explicitly scan for > > > + * soft-reserved memory. Otherwise, the mechanism to disable the > > > + * kernel's consideration of EFI_MEMORY_SP is the > > > + * efi=nosoftreserve option. > > > + */ > > > + do_add_efi_memmap(ADD_EFI_SOFT_RESERVED); > > > + } > > > > > > WARN(efi.memmap.desc_version != 1, > > > "Unexpected EFI_MEMORY_DESCRIPTOR version %ld", > > > @@ -781,6 +810,15 @@ static bool should_map_region(efi_memory_desc_t *md) > > > if (IS_ENABLED(CONFIG_X86_32)) > > > return false; > > > > > > + /* > > > + * EFI specific purpose memory may be reserved by default > > > + * depending on kernel config and boot options. > > > + */ > > > + if (md->type == EFI_CONVENTIONAL_MEMORY > > > + && efi_enabled(EFI_MEM_SOFT_RESERVE) > > > + && (md->attribute & EFI_MEMORY_SP)) > > > + return false; > > > + > > > /* > > > * Map all of RAM so that we can access arguments in the 1:1 > > > * mapping when making EFI runtime calls. > > > @@ -1072,6 +1110,9 @@ static int __init arch_parse_efi_cmdline(char *str) > > > if (parse_option_str(str, "old_map")) > > > set_bit(EFI_OLD_MEMMAP, &efi.flags); > > > > > > + if (parse_option_str(str, "nosoftreserve")) > > > + clear_bit(EFI_MEM_SOFT_RESERVE, &efi.flags); > > > + > > > > Can we move this to the generic efi= handling code? > > To parse_efi_cmdline() in drivers/fimrware/efi.c? Sure. > > > > > > return 0; > > > } > > > early_param("efi", arch_parse_efi_cmdline); > > > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c > > > index 363bb9d00fa5..6d54d5c74347 100644 > > > --- a/drivers/firmware/efi/efi.c > > > +++ b/drivers/firmware/efi/efi.c > > > @@ -52,6 +52,9 @@ struct efi __read_mostly efi = { > > > .tpm_log = EFI_INVALID_TABLE_ADDR, > > > .tpm_final_log = EFI_INVALID_TABLE_ADDR, > > > .mem_reserve = EFI_INVALID_TABLE_ADDR, > > > +#ifdef CONFIG_EFI_SOFT_RESERVE > > > + .flags = 1UL << EFI_MEM_SOFT_RESERVE, > > > +#endif > > > }; > > > EXPORT_SYMBOL(efi); > > > > > > > I'd prefer it if we could call this EFI_MEM_NO_SOFT_RESERVE instead, > > and invert the meaning of the bit. > > ...but that would mean repeat occurrences of > "!efi_enabled(EFI_MEM_NO_SOFT_RESERVE)", doesn't the double negative > seem less readable to you? > One the one hand, yes. On the other hand, it is the only flag whose default is 'enabled' which is also less than ideal. > > > > > diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c > > > index 3caae7f2cf56..35ee98a2c00c 100644 > > > --- a/drivers/firmware/efi/libstub/efi-stub-helper.c > > > +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c > > > @@ -28,6 +28,7 @@ > > > #define EFI_READ_CHUNK_SIZE (1024 * 1024) > > > > > > static unsigned long __chunk_size = EFI_READ_CHUNK_SIZE; > > > +bool efi_nosoftreserve; > > > > > > > This needs a getter function if you want to access it from other > > compilation units. This has to do with how the early relocation code > > handles data symbol references. Please refer to nokaslr() for an > > example. > > Ah, does that mean that the efi_nosoftreserve global variable > instances in different compilation units are effectively static the > way I currently have them defined? No, the problem had to do with relocation of GOT entries on some x86 builds. Then, things got more complicated when I added the 32-bit ARM port, which puts other constraints related to how symbols are placed in the binary. So please duplicate the pattern with the static variable and the __pure setter, which has proven to be the most robust way to expose variables to other compilation units in the stub.