On Tue, Nov 26, 2013 at 01:57:51PM +0800, Dave Young wrote: > kexec kernel will need exactly same mapping for > efi runtime memory ranges. Thus here export the > runtime ranges mapping to sysfs, kexec-tools > will assemble them and pass to 2nd kernel via > setup_data. > > Introducing a new directly /sys/firmware/efi/runtime-map > Just like /sys/firmware/memmap. Containing below attribute > in each file of that directory: > attribute num_pages phys_addr type virt_addr > > It will not work for efi 32bit. Only x86_64 currently. > > Matt: s/efi-runtime-map.c/runtime-map.c > change dir name to runtime-map > update to use desc_size in efi_runtime_map > cleaup the code, add function efi_save_runtime_map > improve err handling > > Signed-off-by: Dave Young <dyoung at redhat.com> > --- > .../ABI/testing/sysfs-firmware-efi-runtime-map | 45 +++++ > arch/x86/platform/efi/efi.c | 26 +++ > drivers/firmware/efi/Kconfig | 10 ++ > drivers/firmware/efi/Makefile | 1 + > drivers/firmware/efi/efi.c | 3 +- > drivers/firmware/efi/runtime-map.c | 199 +++++++++++++++++++++ > include/linux/efi.h | 6 + > 7 files changed, 289 insertions(+), 1 deletion(-) > create mode 100644 Documentation/ABI/testing/sysfs-firmware-efi-runtime-map > create mode 100644 drivers/firmware/efi/runtime-map.c > > diff --git a/Documentation/ABI/testing/sysfs-firmware-efi-runtime-map b/Documentation/ABI/testing/sysfs-firmware-efi-runtime-map > new file mode 100644 > index 0000000..dab8d41 > --- /dev/null > +++ b/Documentation/ABI/testing/sysfs-firmware-efi-runtime-map > @@ -0,0 +1,45 @@ > +What: /sys/firmware/efi/runtime-map/ > +Date: Oct 2013 > +Contact: Dave Young <dyoung at redhat.com> > +Description: > + Switching efi runtime services to virtual mode requires > + that all efi memory ranges which has the runtime attribute s/has/have/ > + bit set to be mapped to virtual addresses. > + > + In kexec kernel kernel can not entering virtual mode again > + because there's a limitation that SetVirtualAddressMap can > + only be called once for entering virtual mode. But kexec > + kernel still need maintain same physical address to virtual > + address mapping as the 1st kernel. Simply say "SetVirtualAddressMap has the limitation that it can be called only once per system lifetime so a second kernel, like kexec, needs to reuse the exact same virtual addresses like the first kernel." > The mappings are exported > + to sysfs so userspace tools can reassemble them and pass them > + into kexec kernel. > + > + /sys/firmware/efi/runtim-map/ is what kernel export for > + this purpose. The structure is as follows: "/sys/firmware/efi/runtime-map/ is the directory the kernel exports that information in." > + > + subdirectories are named with the number of the memory range: > + > + /sys/firmware/efi/runtime-map/0 > + /sys/firmware/efi/runtime-map/1 > + /sys/firmware/efi/runtime-map/2 > + /sys/firmware/efi/runtime-map/3 > + ... > + > + Each subdirectory contains five files: > + > + attribute : The attribute of the memory range. attributes > + num_pages : The size of the memory range in page number. in 4K pages. > + phys_addr : The start physical address of the memory range. The physical address of the memory range > + type : The type of the memory range. > + virt_addr : The start virtual address of the memory range. The virtual address... > + > + Above values are all hexadecimal number with the '0x' prefix. numbers > + > + So, for example: > + > + /sys/firmware/efi/runtime-map/0/attribute > + /sys/firmware/efi/runtime-map/0/num_pages > + /sys/firmware/efi/runtime-map/0/phys_addr > + /sys/firmware/efi/runtime-map/0/type > + /sys/firmware/efi/runtime-map/0/virt_addr > + ... What is this example supposed to show? The above is already detailed enough. > diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c > index 2a292aa..c3a2aaa 100644 > --- a/arch/x86/platform/efi/efi.c > +++ b/arch/x86/platform/efi/efi.c > @@ -76,6 +76,9 @@ static __initdata efi_config_table_type_t arch_tables[] = { > {NULL_GUID, NULL, NULL}, > }; > > +void *efi_runtime_map; > +int nr_efi_runtime_map; > + > /* > * Returns 1 if 'facility' is enabled, 0 otherwise. > */ > @@ -811,6 +814,21 @@ static void __init efi_merge_regions(void) > } > } > > +static int __init efi_save_runtime_map(efi_memory_desc_t *md) It is a static function so no need for the "efi_" prefix. Also, it should be called save_runtime_mem_desc or so because this is what it does. > +{ > + efi_runtime_map = krealloc(efi_runtime_map, > + (nr_efi_runtime_map + 1) * > + memmap.desc_size, GFP_KERNEL); WARNING: Reusing the krealloc arg is almost always a bug #21: FILE: arch/x86/platform/efi/efi.c:819: + efi_runtime_map = krealloc(efi_runtime_map, Please run your diffs through checkpatch.pl --strict before submitting. Btw, we use krealloc below again which should be changed too. > + if (!efi_runtime_map) > + return -ENOMEM; > + > + memcpy(efi_runtime_map + nr_efi_runtime_map * memmap.desc_size, > + md, memmap.desc_size); > + nr_efi_runtime_map++; > + > + return 0; > +} > + > /* > * Map efi memory ranges for runtime serivce and update new_memmap with virtual > * addresses. > @@ -821,6 +839,7 @@ static void * __init efi_map_regions(int *count) > void *p, *new_memmap = NULL; > unsigned long size; > u64 end, systab; > + int error = 0; > > for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) { > md = p; > @@ -852,9 +871,16 @@ static void * __init efi_map_regions(int *count) > > memcpy(new_memmap + (*count * memmap.desc_size), md, > memmap.desc_size); > + if (!error && md->type != EFI_BOOT_SERVICES_CODE && > + md->type != EFI_BOOT_SERVICES_DATA) > + error = efi_save_runtime_map(md); I'm not sure I like this entangled stuff with the error code - I can barely understand what's going on there - not necessarily because of your changes but because this code is not that trivial, especially after time has passed. And I don't see you kfreeing efi_runtime_map in the case where the first krealloc() fails at some iteration. What would be much cleaner IMO would be if you let efi_map_regions() finish, *and*, *only* if it was successful iterate again over the regions and do efi_save_runtime_map() on the runtime ones. This will make the code much more readable too. > (*count)++; > } > > + if (error) { > + nr_efi_runtime_map = 0; > + kfree(efi_runtime_map); > + } This error gets overwritten each time the loop runs and you possibly lose !0 values. What I'm suggesting above would take care of that case too. > ret: > return new_memmap; > } > diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig > index b0fc7c7..b4d01ad 100644 > --- a/drivers/firmware/efi/Kconfig > +++ b/drivers/firmware/efi/Kconfig > @@ -36,4 +36,14 @@ config EFI_VARS_PSTORE_DEFAULT_DISABLE > backend for pstore by default. This setting can be overridden > using the efivars module's pstore_disable parameter. > > +config EFI_RUNTIME_MAP > + bool "Export efi runtime maps to sysfs" if EXPERT > + default X86 && EFI This should depend on KEXEC. > + help > + Export efi runtime memory maps to /sys/firmware/efi/runtime-map. > + That memory map is used for example by kexec to set up efi virtual > + mapping the 2nd kernel, but can also be used for debugging purposes. > + > + See also Documentation/ABI/testing/sysfs-firmware-efi-runtime-map. > + > endmenu > diff --git a/drivers/firmware/efi/Makefile b/drivers/firmware/efi/Makefile > index 99245ab..ecadcc1 100644 > --- a/drivers/firmware/efi/Makefile > +++ b/drivers/firmware/efi/Makefile > @@ -4,3 +4,4 @@ > obj-y += efi.o vars.o > obj-$(CONFIG_EFI_VARS) += efivars.o > obj-$(CONFIG_EFI_VARS_PSTORE) += efi-pstore.o > +obj-$(CONFIG_EFI_RUNTIME_MAP) += runtime-map.o This will conflict with upstream: checking file drivers/firmware/efi/Makefile Hunk #1 FAILED at 4. 1 out of 1 hunk FAILED It should be: diff --git a/drivers/firmware/efi/Makefile b/drivers/firmware/efi/Makefile index 9ba156d3c775..7f05c368b868 100644 --- a/drivers/firmware/efi/Makefile +++ b/drivers/firmware/efi/Makefile @@ -5,3 +5,4 @@ obj-y += efi.o vars.o obj-$(CONFIG_EFI_VARS) += efivars.o obj-$(CONFIG_EFI_VARS_PSTORE) += efi-pstore.o obj-$(CONFIG_UEFI_CPER) += cper.o +obj-$(CONFIG_EFI_RUNTIME_MAP) += runtime-map.o > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c > index 5d85956..a480de8 100644 > --- a/drivers/firmware/efi/efi.c > +++ b/drivers/firmware/efi/efi.c > @@ -38,7 +38,8 @@ struct efi __read_mostly efi = { > }; > EXPORT_SYMBOL(efi); > > -static struct kobject *efi_kobj; > +struct kobject *efi_kobj; > +EXPORT_SYMBOL_GPL(efi_kobj); Can we move this private to linux/drivers/efi/runtime-map.c? If yes, then there's no need to export it. > static struct kobject *efivars_kobj; > > /* > diff --git a/drivers/firmware/efi/runtime-map.c b/drivers/firmware/efi/runtime-map.c > new file mode 100644 > index 0000000..036d099 > --- /dev/null > +++ b/drivers/firmware/efi/runtime-map.c > @@ -0,0 +1,199 @@ > +/* > + * linux/drivers/efi/runtime-map.c > + * Copyright (C) 2013 Red Hat, Inc., Dave Young <dyoung at redhat.com> > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License v2.0 as published by > + * the Free Software Foundation > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. Please drop this boilerplate text and do a single sentence saying that it is licensed under GPL v2 and refer to COPYING in the kernel repo. > + * > + */ > + > +#include <linux/string.h> > +#include <linux/kernel.h> > +#include <linux/module.h> > +#include <linux/types.h> > +#include <linux/efi.h> > +#include <linux/slab.h> > + > +#include <asm/setup.h> > + > +struct efi_runtime_map_entry { > + efi_memory_desc_t md; > + struct kobject kobj; /* kobject for each entry */ > +}; > + > +static struct efi_runtime_map_entry **map_entries; > + > +static ssize_t map_attr_show(struct kobject *kobj, > + struct attribute *attr, char *buf); > +static ssize_t type_show(struct efi_runtime_map_entry *entry, char *buf); > +static ssize_t phys_addr_show(struct efi_runtime_map_entry *entry, char *buf); > +static ssize_t virt_addr_show(struct efi_runtime_map_entry *entry, char *buf); > +static ssize_t num_pages_show(struct efi_runtime_map_entry *entry, char *buf); > +static ssize_t attribute_show(struct efi_runtime_map_entry *entry, char *buf); > + > +struct map_attribute { > + struct attribute attr; > + ssize_t (*show)(struct efi_runtime_map_entry *entry, char *buf); > +}; > + > +static struct map_attribute map_type_attr = __ATTR_RO(type); > +static struct map_attribute map_phys_addr_attr = __ATTR_RO(phys_addr); > +static struct map_attribute map_virt_addr_attr = __ATTR_RO(virt_addr); > +static struct map_attribute map_num_pages_attr = __ATTR_RO(num_pages); > +static struct map_attribute map_attribute_attr = __ATTR_RO(attribute); > + > +/* > + * These are default attributes that are added for every memmap entry. > + */ > +static struct attribute *def_attrs[] = { > + &map_type_attr.attr, > + &map_phys_addr_attr.attr, > + &map_virt_addr_attr.attr, > + &map_num_pages_attr.attr, > + &map_attribute_attr.attr, > + NULL > +}; > + > +static const struct sysfs_ops map_attr_ops = { > + .show = map_attr_show, > +}; > + > +static inline struct efi_runtime_map_entry *to_map_entry(struct kobject *kobj) > +{ > + return container_of(kobj, struct efi_runtime_map_entry, kobj); > +} > + > +static void map_release(struct kobject *kobj) > +{ > + struct efi_runtime_map_entry *entry; > + > + entry = to_map_entry(kobj); > + kfree(entry); > +} > + > +static struct kobj_type __refdata map_ktype = { > + .sysfs_ops = &map_attr_ops, > + .default_attrs = def_attrs, > + .release = map_release, > +}; > + > +static ssize_t type_show(struct efi_runtime_map_entry *entry, char *buf) > +{ > + return snprintf(buf, PAGE_SIZE, "0x%x\n", entry->md.type); > +} > + > +static ssize_t phys_addr_show(struct efi_runtime_map_entry *entry, char *buf) > +{ > + return snprintf(buf, PAGE_SIZE, "0x%llx\n", entry->md.phys_addr); > +} > + > +static ssize_t virt_addr_show(struct efi_runtime_map_entry *entry, char *buf) > +{ > + return snprintf(buf, PAGE_SIZE, "0x%llx\n", entry->md.virt_addr); > +} > + > +static ssize_t num_pages_show(struct efi_runtime_map_entry *entry, char *buf) > +{ > + return snprintf(buf, PAGE_SIZE, "0x%llx\n", entry->md.num_pages); > +} > + > +static ssize_t attribute_show(struct efi_runtime_map_entry *entry, char *buf) > +{ > + return snprintf(buf, PAGE_SIZE, "0x%llx\n", entry->md.attribute); > +} I can see the macro here writing itself too :) > + > +static inline struct map_attribute *to_map_attr(struct attribute *attr) > +{ > + return container_of(attr, struct map_attribute, attr); > +} > + > +static ssize_t map_attr_show(struct kobject *kobj, > + struct attribute *attr, char *buf) > +{ > + struct efi_runtime_map_entry *entry = to_map_entry(kobj); > + struct map_attribute *map_attr = to_map_attr(attr); > + > + return map_attr->show(entry, buf); > +} > + > +static struct kset *map_kset; > + > +/* > + * Add map entry on sysfs > + */ The function name should be enough - no need for the comment. > +static struct efi_runtime_map_entry *add_sysfs_runtime_map_entry(int nr) > +{ > + int ret; > + unsigned long desc_size; > + struct efi_runtime_map_entry *entry; > + struct efi_info *e = &boot_params.efi_info; > + > + if (!map_kset) { > + map_kset = kset_create_and_add("runtime-map", NULL, > + efi_kobj); > + if (!map_kset) > + return ERR_PTR(-ENOMEM); > + } > + > + entry = kzalloc(sizeof(*entry), GFP_KERNEL); > + if (!entry) { > + kset_unregister(map_kset); > + return entry; > + } > + > + desc_size = e->efi_memdesc_size; > + memcpy(&entry->md, efi_runtime_map + nr * desc_size, > + sizeof(efi_memory_desc_t)); If you're going to break this line anyway, you don't really need the local desc_size variable: memcpy(&entry->md, efi_runtime_map + nr * e->efi_memdesc_size, sizeof(efi_memory_desc_t)); > + > + kobject_init(&entry->kobj, &map_ktype); > + entry->kobj.kset = map_kset; > + ret = kobject_add(&entry->kobj, NULL, "%d", nr); > + if (ret) { > + kobject_put(&entry->kobj); > + kset_unregister(map_kset); > + return ERR_PTR(ret); > + } > + > + return entry; > +} > + > +static int __init efi_runtime_map_init(void) > +{ > + int i, j, ret = 0; > + struct efi_runtime_map_entry *entry; > + > + if (!efi_runtime_map) { > + pr_warn("no efi_runtime_map found\n"); > + return -EINVAL; > + } > + > + map_entries = kzalloc(nr_efi_runtime_map * > + sizeof(struct efi_runtime_map_entry *), GFP_KERNEL); Shorten: map_entries = kzalloc(nr_efi_runtime_map * sizeof(entry), GFP_KERNEL); > + if (!map_entries) > + return -ENOMEM; > + > + for (i = 0; i < nr_efi_runtime_map; i++) { > + entry = add_sysfs_runtime_map_entry(i); > + if (IS_ERR(entry)) { > + for (j = i - 1; j > 0; j--) { > + entry = *(map_entries + j); > + kobject_put(&entry->kobj); > + } > + if (map_kset) > + kset_unregister(map_kset); > + ret = PTR_ERR(entry); > + goto out; > + } You can move this error unwinding path away from the main loop: if (IS_ERR(entry)) goto out; and make it more readable. > + *(map_entries + i) = entry; > + } > + > +out: > + return ret; > +} > +late_initcall(efi_runtime_map_init); Why an initcall? Can't we run this from efisubsys_init()? Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. --