Re: [RFC PATCH 3/6] soc: qcom: Add Qualcomm minidump kernel driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 21/02/2023 11:25, Mukesh Ojha wrote:
Minidump is a best effort mechanism to collect useful and predefined
data for first level of debugging on end user devices running on
Qualcomm SoCs. It is built on the premise that System on Chip (SoC)
or subsystem part of SoC crashes, due to a range of hardware and
software bugs. Hence, the ability to collect accurate data is only
a best-effort. The data collected could be invalid or corrupted,
data collection itself could fail, and so on.

Qualcomm devices in engineering mode provides a mechanism for
generating full system ramdumps for post mortem debugging. But in some
cases it's however not feasible to capture the entire content of RAM.
The minidump mechanism provides the means for selecting region should
be included in the ramdump. The solution supports extracting the
ramdump/minidump produced either over USB or stored to an attached
storage device.

The core of minidump feature is part of Qualcomm's boot firmware code.
It initializes shared memory(SMEM), which is a part of DDR and
allocates a small section of it to minidump table i.e also called
global table of content (G-ToC). Each subsystem (APSS, ADSP, ...) has
their own table of segments to be included in the minidump, all
references from a descriptor in SMEM (G-ToC). Each segment/region has
some details like name, physical address and it's size etc. and it
could be anywhere scattered in the DDR.

Minidump kernel driver adds the capability to add linux region to be
dumped as part of ram dump collection. It provides appropriate symbol
to check its enablement and register client regions.

To simplify post mortem debugging, it creates and maintain an ELF
header as first region that gets updated with upon registration
of a new region.

Signed-off-by: Mukesh Ojha <quic_mojha@xxxxxxxxxxx>
---
  drivers/soc/qcom/Kconfig         |  14 ++
  drivers/soc/qcom/Makefile        |   1 +
  drivers/soc/qcom/qcom_minidump.c | 490 +++++++++++++++++++++++++++++++++++++++++
  include/soc/qcom/minidump.h      |  40 ++++
  include/soc/qcom/qcom_minidump.h |  24 +-
  5 files changed, 568 insertions(+), 1 deletion(-)
  create mode 100644 drivers/soc/qcom/qcom_minidump.c
  create mode 100644 include/soc/qcom/minidump.h

diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
index ae504c4..0fc7698 100644
--- a/drivers/soc/qcom/Kconfig
+++ b/drivers/soc/qcom/Kconfig
@@ -248,4 +248,18 @@ config QCOM_ICC_BWMON
  	  the fixed bandwidth votes from cpufreq (CPU nodes) thus achieve high
  	  memory throughput even with lower CPU frequencies.
+config QCOM_MINIDUMP
+	bool "QCOM Minidump Support"

Why can't this be a module?


+	depends on ARCH_QCOM || COMPILE_TEST
+	depends on QCOM_SMEM
+	help
+	  Enablement of core minidump feature is controlled from boot firmware
+	  side, and this config allow linux to query and manages APPS minidump
+	  table.
+
+	  Client drivers can register their internal data structures and debug
+	  messages as part of the minidump region and when the SoC is crashed,
+	  these selective regions will be dumped instead of the entire DDR.
+	  This saves significant amount of time and/or storage space.
+
  endmenu
diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile
index d66604a..e1ff492 100644
--- a/drivers/soc/qcom/Makefile
+++ b/drivers/soc/qcom/Makefile
@@ -29,3 +29,4 @@ obj-$(CONFIG_QCOM_RPMHPD) += rpmhpd.o
  obj-$(CONFIG_QCOM_RPMPD) += rpmpd.o
  obj-$(CONFIG_QCOM_KRYO_L2_ACCESSORS) +=	kryo-l2-accessors.o
  obj-$(CONFIG_QCOM_ICC_BWMON)	+= icc-bwmon.o
+obj-$(CONFIG_QCOM_MINIDUMP) += qcom_minidump.o
diff --git a/drivers/soc/qcom/qcom_minidump.c b/drivers/soc/qcom/qcom_minidump.c
new file mode 100644
index 0000000..eb63b75
--- /dev/null
+++ b/drivers/soc/qcom/qcom_minidump.c
@@ -0,0 +1,490 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#define pr_fmt(fmt) "Minidump: " fmt
+
+#include <linux/init.h>
+#include <linux/export.h>
+#include <linux/kernel.h>
+#include <linux/err.h>
+#include <linux/elf.h>
+#include <linux/errno.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/of.h>
+#include <linux/printk.h>
+#include <linux/soc/qcom/smem.h>
+#include <soc/qcom/qcom_minidump.h>
+#include <soc/qcom/minidump.h>
+
+/**
+ * DOC: Overview
+ *
+ *   +-----------------------------------------------+
+ *   |   DDR                       +-------------+   |
+ *   |                             |      SS0-ToC|   |
+ *   | +----------------+     +----------------+ |   |
+ *   | |Shared memory   |     |         SS1-ToC| |   |
+ *   | |(SMEM)          |     |                | |   |
+ *   | |                | +-->|--------+       | |   |
+ *   | |G-ToC           | |   | SS-ToC  \      | |   |
+ *   | |+-------------+ | |   | +-----------+  | |   |
+ *   | ||-------------| | |   | |-----------|  | |   |
+ *   | || SS0-ToC     | | | +-|<|SS1 region1|  | |   |
+ *   | ||-------------| | | | | |-----------|  | |   |
+ *   | || SS1-ToC     |-|>+ | | |SS1 region2|  | |   |
+ *   | ||-------------| |   | | |-----------|  | |   |
+ *   | || SS2-ToC     | |   | | |  ...      |  | |   |
+ *   | ||-------------| |   | | |-----------|  | |   |
+ *   | ||  ...        | |   |-|<|SS1 regionN|  | |   |
+ *   | ||-------------| |   | | |-----------|  | |   |
+ *   | || SSn-ToC     | |   | | +-----------+  | |   |
+ *   | |+-------------+ |   | |                | |   |
+ *   | |                |   | |----------------| |   |
+ *   | |                |   +>|  regionN       | |   |
+ *   | |                |   | |----------------| |   |
+ *   | +----------------+   | |                | |   |
+ *   |                      | |----------------| |   |
+ *   |                      +>|  region1       | |   |
+ *   |                        |----------------| |   |
+ *   |                        |                | |   |
+ *   |                        |----------------|-+   |
+ *   |                        |  region5       |     |
+ *   |                        |----------------|     |
+ *   |                        |                |     |
+ *   |  Region information    +----------------+     |
+ *   | +---------------+                             |
+ *   | |region name    |                             |
+ *   | |---------------|                             |
+ *   | |region address |                             |
+ *   | |---------------|                             |
+ *   | |region size    |                             |
+ *   | +---------------+                             |
+ *   +-----------------------------------------------+
+ *       G-ToC: Global table of content
+ *       SS-ToC: Subsystem table of content
+ *       SS0-SSn: Subsystem numbered from 0 to n
+ *
+ * The core of minidump feature is part of Qualcomm's boot firmware code.
+ * It initializes shared memory(SMEM), which is a part of DDR and
+ * allocates a small section of it to minidump table i.e also called
+ * global table of content (G-ToC).
+ *
+ * Each subsystem (APSS, ADSP, ...) has their own table of segments to be
+ * included in the minidump, all references from a descriptor in SMEM
+ * (G-ToC). Each segment/region has some details like name, physical
+ * address and it's size etc. and it could be anywhere scattered
+ * in the DDR.
+ *
+ * Qualcomm SoCs supports extracting the ramdump/minidump produced
+ * either over USB or stored to an attached storage device.
+ *
Are there any userspace tools to parse these dumps?

How can someone collect minidump and use this in upstream setup, do you have any quick guide to do this?

+ */
+
+/**
+ * struct minidump_elfhdr - Minidump table elf header
+ * @ehdr: Elf main header
+ * @shdr: Section header
+ * @phdr: Program header
+ * @elf_offset: Section offset in elf
+ * @strtable_idx: String table current index position
+ */
+struct minidump_elfhdr {
+	struct elfhdr		*ehdr;
+	struct elf_shdr		*shdr;
+	struct elf_phdr		*phdr;
+	size_t			elf_offset;
+	size_t			strtable_idx;
+};
+
+/**
+ * struct minidump - Minidump driver private data
+ * @md_gbl_toc	 : Global TOC pointer
+ * @md_ss_toc	 : High level OS TOC pointer
+ * @md_regions	 : High level OS region base pointer
+ * @elf		 : Minidump elf header
+ */
+struct minidump {
+	struct minidump_global_toc	*md_gbl_toc;

Totally unused.

+	struct minidump_subsystem	*md_ss_toc;
+	struct minidump_region		*md_regions;
+	struct minidump_elfhdr		elf;
+};
+
+/*
+ * In some of the Old Qualcomm devices, boot firmware statically allocates 300
+ * as total number of supported region (including all co-processors) in
+ * minidump table out of which linux was using 200. In future, this limitation
+ * from boot firmware might get removed by allocating the region dynamically.
+ * So, keep it compatible with older devices, we can the current limit for Linux
+ * to 200.
+ */
+#define MAX_NUM_ENTRIES	  200
+
+static struct minidump minidump;
+static DEFINE_MUTEX(minidump_lock);
+
...

+/* Update ELF header */
+static void minidump_update_elf_header(const struct qcom_minidump_region *region)
+{
+	struct elfhdr *ehdr = minidump.elf.ehdr;
+	struct elf_shdr *shdr;
+	struct elf_phdr *phdr;
+
+	shdr = elf_shdr_entry_addr(ehdr, ehdr->e_shnum++);
+	phdr = elf_phdr_entry_addr(ehdr, ehdr->e_phnum++);
+
+	shdr->sh_type = SHT_PROGBITS;
+	shdr->sh_name = append_str_to_strtable(region->name);
+	shdr->sh_addr = (elf_addr_t)region->virt_addr;
+	shdr->sh_size = region->size;
+	shdr->sh_flags = SHF_WRITE;
+	shdr->sh_offset = minidump.elf.elf_offset;
+	shdr->sh_entsize = 0;
+
+	phdr->p_type = PT_LOAD;
+	phdr->p_offset = minidump.elf.elf_offset;
+	phdr->p_vaddr = (elf_addr_t)region->virt_addr;
+	phdr->p_paddr = region->phys_addr;
+	phdr->p_filesz = phdr->p_memsz = region->size;
+	phdr->p_flags = PF_R | PF_W;
+	minidump.elf.elf_offset += shdr->sh_size;
+}
+
+/* Add region in subsystem ToC */
+static void minidump_ss_add_region(const struct qcom_minidump_region *region)
+{
+	struct minidump_region *mdr;
+	unsigned int region_cnt = le32_to_cpu(minidump.md_ss_toc->region_count);
+
+	mdr = &minidump.md_regions[region_cnt];
+	strscpy(mdr->name, region->name, sizeof(mdr->name));
+	mdr->address = cpu_to_le64(region->phys_addr);
+	mdr->size = cpu_to_le64(region->size);
+	mdr->valid = cpu_to_le32(MINIDUMP_REGION_VALID);
+	region_cnt++;
+	minidump.md_ss_toc->region_count = cpu_to_le32(region_cnt);
+}
+
+static bool qcom_minidump_valid_region(const struct qcom_minidump_region *region)
+{
+	return region &&
+		strnlen(region->name, MAX_NAME_LENGTH) < MAX_NAME_LENGTH &&
+		region->virt_addr &&
+		region->size &&minidump

mindump is static variable, why are we checking for this?

+		IS_ALIGNED(region->size, 4);

This function looks very much unreadable, can we rearrage this.

+}
+
+#define MAX_STRTBL_SIZE	   (MAX_NUM_ENTRIES * MAX_REGION_NAME_LENGTH)

please move to the top.

+static int minidump_add_elf_header(void)
+{
+	struct qcom_minidump_region elfregion;
+	struct elfhdr *ehdr;
+	struct elf_shdr *shdr;
+	struct elf_phdr *phdr;
+	unsigned int  elfh_size;
+	unsigned int strtbl_off;
+	unsigned int phdr_off;
+	char *banner;
+	unsigned int banner_len;
+
+	banner_len = strlen(linux_banner);
+	/*
+	 * Header buffer contains:
+	 * ELF header, (MAX_NUM_ENTRIES + 4) of Section and Program ELF headers,
+	 * where, 4 additional entries, one for empty header, one for string table
+	 * one for minidump table and one for linux banner.
+	 *
+	 * Linux banner is stored in minidump to aid post mortem tools to determine
+	 * the kernel version.
+	 */
+	elfh_size = sizeof(*ehdr);
+	elfh_size += MAX_STRTBL_SIZE;
+	elfh_size += banner_len + 1;
+	elfh_size += ((sizeof(*shdr) + sizeof(*phdr)) * (MAX_NUM_ENTRIES + 4));
+	elfh_size = ALIGN(elfh_size, 4);
+
+	minidump.elf.ehdr = kzalloc(elfh_size, GFP_KERNEL);

when will this memory freed?

+	if (!minidump.elf.ehdr)
+		return -ENOMEM;
+
+	/* Register ELF header as first region */
+	strscpy(elfregion.name, "KELF_HEADER", sizeof(elfregion.name));
+	elfregion.virt_addr = minidump.elf.ehdr;
+	elfregion.phys_addr = virt_to_phys(minidump.elf.ehdr);
+	elfregion.size = elfh_size;
+	minidump_ss_add_region(&elfregion);
+
+	ehdr = minidump.elf.ehdr;
+	/* Assign Section/Program headers offset */
+	minidump.elf.shdr = shdr = (struct elf_shdr *)(ehdr + 1);
+	minidump.elf.phdr = phdr = (struct elf_phdr *)(shdr + MAX_NUM_ENTRIES);
+	phdr_off = sizeof(*ehdr) + (sizeof(*shdr) * MAX_NUM_ENTRIES);
+
+	memcpy(ehdr->e_ident, ELFMAG, SELFMAG);
+	ehdr->e_ident[EI_CLASS] = ELF_CLASS;
+	ehdr->e_ident[EI_DATA] = ELF_DATA;
+	ehdr->e_ident[EI_VERSION] = EV_CURRENT;
+	ehdr->e_ident[EI_OSABI] = ELF_OSABI;
+	ehdr->e_type = ET_CORE;
+	ehdr->e_machine  = ELF_ARCH;
+	ehdr->e_version = EV_CURRENT;
+	ehdr->e_ehsize = sizeof(*ehdr);
+	ehdr->e_phoff = phdr_off;
+	ehdr->e_phentsize = sizeof(*phdr);
+	ehdr->e_shoff = sizeof(*ehdr);
+	ehdr->e_shentsize = sizeof(*shdr);
+	ehdr->e_shstrndx = 1;
+
+	minidump.elf.elf_offset = elfh_size;
+
+	/*
+	 * The zeroth index of the section header is reserved and is rarely used.
+	 * Set the section header as null (SHN_UNDEF) and move to the next one.
+	 * 2nd Section is String table.
+	 */
+	minidump.elf.strtable_idx = 1;
+	strtbl_off = sizeof(*ehdr) + ((sizeof(*phdr) + sizeof(*shdr)) * MAX_NUM_ENTRIES);
+	shdr++;
+	shdr->sh_type = SHT_STRTAB;
+	shdr->sh_offset = (elf_addr_t)strtbl_off;
+	shdr->sh_size = MAX_STRTBL_SIZE;
+	shdr->sh_entsize = 0;
+	shdr->sh_flags = 0;
+	shdr->sh_name = append_str_to_strtable("STR_TBL");
+	shdr++;
+
+	/* 3rd Section is for Minidump_table VA, used by parsers */
+	shdr->sh_type = SHT_PROGBITS;
+	shdr->sh_entsize = 0;
+	shdr->sh_flags = 0;
+	shdr->sh_addr = (elf_addr_t)&minidump;
+	shdr->sh_name = append_str_to_strtable("minidump_table");
+	shdr++;
+
+	/* 4th Section is Linux banner */
+	banner = (char *)ehdr + strtbl_off + MAX_STRTBL_SIZE;
+	memcpy(banner, linux_banner, banner_len);
+
+	shdr->sh_type = SHT_PROGBITS;
+	shdr->sh_offset = (elf_addr_t)(strtbl_off + MAX_STRTBL_SIZE);
+	shdr->sh_size = banner_len + 1;
+	shdr->sh_addr = (elf_addr_t)linux_banner;
+	shdr->sh_entsize = 0;
+	shdr->sh_flags = SHF_WRITE;
+	shdr->sh_name = append_str_to_strtable("linux_banner");
+
+	phdr->p_type = PT_LOAD;
+	phdr->p_offset = (elf_addr_t)(strtbl_off + MAX_STRTBL_SIZE);
+	phdr->p_vaddr = (elf_addr_t)linux_banner;
+	phdr->p_paddr = virt_to_phys(linux_banner);
+	phdr->p_filesz = phdr->p_memsz = banner_len + 1;
+	phdr->p_flags = PF_R | PF_W;
+
+	/* Update headers count*/
+	ehdr->e_phnum = 1;
+	ehdr->e_shnum = 4;
Can we define these magic constants.

nit, a new line before return would be nice.

+	return 0;
+}
+
+static int qcom_minidump_init(void)
+{
+	struct minidump_subsystem *mdsstoc;
+
+	mdsstoc = qcom_minidump_subsystem_desc(MINIDUMP_APSS_DESC);
+	if (IS_ERR(mdsstoc))
+		return -EINVAL;
+
+	minidump.md_regions = kcalloc(MAX_NUM_ENTRIES,
+			      sizeof(struct minidump_region), GFP_KERNEL);
+	if (!minidump.md_regions)
+		return -ENOMEM;
+
+	minidump.md_ss_toc = mdsstoc;
+	/* Share memory table update */
+	mdsstoc->regions_baseptr = cpu_to_le64(virt_to_phys(minidump.md_regions));
+	/* Tell bootloader not to encrypt the regions of this subsystem */
+	mdsstoc->encryption_status = cpu_to_le32(MINIDUMP_SS_ENCR_DONE);
+	mdsstoc->encryption_required = cpu_to_le32(MINIDUMP_SS_ENCR_NOTREQ);
+
+	mdsstoc->enabled = cpu_to_le32(MINIDUMP_SS_ENABLED);
+	mdsstoc->status = cpu_to_le32(1);
+	mdsstoc->region_count = cpu_to_le32(0);
+
+	return 0;
+}
+
+/**
+ * qcom_minidump_ready - Check, if minidump is ready for client registration or not.
+ *
+ * Return: zero on success and negative on failure.
+ *
+ * Qualcomm minidump feature is dependent on Qualcomm's shared memory and it is
+ * possible for a arm64 target to not have it's device tree entry present, for
+ * such case, qcom_minidump_ready() returns -ENODEV and client should not further
+ * try to register their region with minidump driver.
+ */
+int qcom_minidump_ready(void)
+{
+	void *ptr;
+	struct device_node *np;
+	static bool is_smem_available = true;
+
+	if (!is_smem_available || !(np = of_find_compatible_node(NULL, NULL, "qcom,smem"))) {

just check for dt node here does not mean that smem device is available, you should probably check if the device is avaliable aswell using of_device_is_available()


We should proabably return -EPROBEDEFER incase the node is present and device is not present.


+		is_smem_available = false;
+		return -ENODEV;
+	}
+
+	of_node_put(np);

<--
+	ptr = qcom_smem_get(QCOM_SMEM_HOST_ANY, SBL_MINIDUMP_SMEM_ID, NULL);
+	if (IS_ERR(ptr))
+		return PTR_ERR(ptr);
-->

If we are already checking for global toc here, why not just set it in minidump minidump.md_gbl_toc = qcom_smem_get(QCOM_SMEM_HOST_ANY, SBL_MINIDUMP_SMEM_ID, &size);
...

and then stop calling qcom_smem_get to get global toc on every call to qcom_minidump_subsystem_desc()


+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(qcom_minidump_ready);
+
+/**
+ * qcom_minidump_subsystem_desc() - Get minidump subsystem descriptor.
+ * @minidump_index: minidump index for a subsystem in minidump table
+ *
+ * Return: minidump subsystem descriptor address on success and error
+ * on failure
+ */
+struct minidump_subsystem *qcom_minidump_subsystem_desc(unsigned int minidump_index)
+{
+	struct minidump_global_toc *mdgtoc;
+	size_t size;
+
+	mdgtoc = qcom_smem_get(QCOM_SMEM_HOST_ANY, SBL_MINIDUMP_SMEM_ID, &size);
+	if (IS_ERR(mdgtoc)) {
+		pr_err("Unable to find minidump smem item\n");
+		return ERR_CAST(mdgtoc);
+	}
+
+	if (size < sizeof(*mdgtoc) || !mdgtoc->status) {
+		pr_err("Minidump table is not initialized\n");
+		return ERR_PTR(-EINVAL);
+	}
+
+	return &mdgtoc->subsystems[minidump_index];

once we fix qcom_minidump_ready() with the suggestion then this call will be

struct minidump_subsystem *qcom_minidump_subsystem_desc(..)
{
	return &minidump.md_gbl_toc->subsystems[minidump_index];
}

+}
+EXPORT_SYMBOL_GPL(qcom_minidump_subsystem_desc);
+
+/**
+ * qcom_minidump_region_register() - Register a region in Minidump table.
+ * @region: minidump region.
+ *
+ * Client should not call this directly instead first call qcom_minidump_ready()
+ * to check if minidump is ready to do registration if yes, then call this API.
+ *
+ * Return: On success, it returns region index in minidump table and negative
+ * error value on failure.
+ */
+int qcom_minidump_region_register(const struct qcom_minidump_region *region)
+{
+	static bool minidump_init_done;
why not move this type of static variables to struct minidump.

+	unsigned int num_region;
+	int ret;

we should check qcom_minidump_ready() has been called either by setting a flag in struct minidump and return early on in case its not ready.


+
+	/* Initialize minidump context on first call */
+	mutex_lock(&minidump_lock);
+	if (!minidump_init_done) {
+		ret = qcom_minidump_init();
+		if (ret)
+			goto unlock;
+
+		minidump_init_done = true;
+		/* First entry would be ELF header */
+		ret = minidump_add_elf_header();
+		if (ret) {
+			kfree(minidump.md_regions);

should we not reset minidump_init_done = false;
or move
minidump_init_done = true;
to end of this loop.

+			goto unlock;
+		}
+	}
+
+	if (!qcom_minidump_valid_region(region)) {
+		ret = -EINVAL;
+		goto unlock;
+	}
+
+	ret = get_minidump_region_index(region);
+	if (ret >= 0) {
+		pr_info("%s region is already registered\n", region->name);
+		ret = -EEXIST;
+		goto unlock;
+	}
+
+	/* Check if there is a room for a new entry */
+	ret = num_region = le32_to_cpu(minidump.md_ss_toc->region_count);
+	if (num_region >= MAX_NUM_ENTRIES) {
+		pr_err("maximum region limit %u reached\n", num_region);
+		ret = -ENOSPC;
+		goto unlock;
+	}
+
+	minidump_ss_add_region(region);
+	minidump_update_elf_header(region);
+unlock:
+	mutex_unlock(&minidump_lock);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(qcom_minidump_region_register);




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [Linux for Sparc]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux