On 6/29/2023 8:03 AM, Pavan Kondeti wrote:
On Wed, Jun 28, 2023 at 06:04:31PM +0530, Mukesh Ojha wrote:
Minidump is a best effort mechanism to collect useful and predefined
data for first level of debugging on end user devices running on
Qualcomm SoCs. It is built on the premise that System on Chip (SoC)
or subsystem part of SoC crashes, due to a range of hardware and
software bugs. Hence, the ability to collect accurate data is only
a best-effort. The data collected could be invalid or corrupted,
data collection itself could fail, and so on.
Qualcomm devices in engineering mode provides a mechanism for
generating full system ramdumps for post mortem debugging. But in some
cases it's however not feasible to capture the entire content of RAM.
The minidump mechanism provides the means for selecting region should
be included in the ramdump. The solution supports extracting the
ramdump/minidump produced either over USB or stored to an attached
storage device.
Minidump kernel driver implementation is divided into two parts for
simplicity, one is minidump core which can also be called minidump
frontend(As API gets exported from this driver for registration with
backend) and the other part is minidump backend i.e, where the underlying
implementation of minidump will be there. There could be different way
how the backend is implemented like Shared memory, Memory mapped IO
or Resource manager based where the guest region information is passed
to hypervisor via hypercalls.
Minidump Client-1 Client-2 Client-5 Client-n
| | | |
| | ... | ... |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| +---+--------------+----+ |
+-----------+ qcom_minidump(core) +--------+
| |
+------+-----+------+---+
| | |
| | |
+---------------+ | +--------------------+
| | |
| | |
| | |
v v v
+-------------------+ +-------------------+ +------------------+
|qcom_minidump_smem | |qcom_minidump_mmio | | qcom_minidump_rm |
| | | | | |
+-------------------+ +-------------------+ +------------------+
Shared memory Memory mapped IO Resource manager
(backend) (backend) (backend)
Here, we will be giving all analogy of backend with SMEM as it is the
only implemented backend at present but general idea remains the same.
The core of minidump feature is part of Qualcomm's boot firmware code.
It initializes shared memory (SMEM), which is a part of DDR and
allocates a small section of it to minidump table i.e also called
global table of content (G-ToC). Each subsystem (APSS, ADSP, ...) has
their own table of segments to be included in the minidump, all
references from a descriptor in SMEM (G-ToC). Each segment/region has
some details like name, physical address and it's size etc. and it
could be anywhere scattered in the DDR.
qcom_minidump(core or frontend) driver adds the capability to add APSS
region to be dumped as part of ram dump collection. It provides
appropriate symbol register/unregister client regions.
To simplify post mortem debugging, it creates and maintain an ELF
header as first region that gets updated upon registration
of a new region.
Signed-off-by: Mukesh Ojha <quic_mojha@xxxxxxxxxxx>
---
drivers/soc/qcom/Kconfig | 15 +
drivers/soc/qcom/Makefile | 2 +
drivers/soc/qcom/qcom_minidump.c | 456 ++++++++++++++++++++++++++++++
drivers/soc/qcom/qcom_minidump_internal.h | 75 +++++
include/soc/qcom/qcom_minidump.h | 35 +++
5 files changed, 583 insertions(+)
create mode 100644 drivers/soc/qcom/qcom_minidump.c
create mode 100644 drivers/soc/qcom/qcom_minidump_internal.h
diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
index 982310b5a1cb..874ee8c3efe0 100644
--- a/drivers/soc/qcom/Kconfig
+++ b/drivers/soc/qcom/Kconfig
@@ -279,6 +279,21 @@ config QCOM_INLINE_CRYPTO_ENGINE
tristate
select QCOM_SCM
+config QCOM_MINIDUMP
+ tristate "QCOM Minidump APSS Core Infrastructure"
+ depends on ARCH_QCOM
+ help
+ This config allow linux core infrastructure for APSS minidump for
+ underlying backend(smem etc.) which can hook themselves to this and
+ work as one unit. So, this config should be selected in combination
+ with its backend.
+
+ After this Linux clients driver can register their internal data
+ structures and debug messages as part of the apss minidump region
+ and when the SoC is crashed, and these selective regions will be
+ dumped instead of the entire DDR. This saves significant amount
+ of time and/or storage space.
+
config QCOM_MINIDUMP_SMEM
tristate "QCOM Minidump SMEM (as backend) Support"
depends on ARCH_QCOM
diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile
index 89b775512bef..737d868757ac 100644
--- a/drivers/soc/qcom/Makefile
+++ b/drivers/soc/qcom/Makefile
@@ -34,3 +34,5 @@ obj-$(CONFIG_QCOM_KRYO_L2_ACCESSORS) += kryo-l2-accessors.o
obj-$(CONFIG_QCOM_ICC_BWMON) += icc-bwmon.o
qcom_ice-objs += ice.o
obj-$(CONFIG_QCOM_INLINE_CRYPTO_ENGINE) += qcom_ice.o
+obj-$(CONFIG_QCOM_MINIDUMP) += qcom_minidump.o
+obj-$(CONFIG_QCOM_MINIDUMP_SMEM) += qcom_minidump_smem.o
diff --git a/drivers/soc/qcom/qcom_minidump.c b/drivers/soc/qcom/qcom_minidump.c
new file mode 100644
index 000000000000..7744e57843ab
--- /dev/null
+++ b/drivers/soc/qcom/qcom_minidump.c
@@ -0,0 +1,456 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/device.h>
+#include <linux/export.h>
+#include <linux/kallsyms.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/string.h>
+
+#include "qcom_minidump_internal.h"
+
+/*
+ * In some of the Old Qualcomm devices, boot firmware statically allocates 300
+ * as total number of supported region (including all co-processors) in
+ * minidump table out of which linux was using 201. In future, this limitation
+ * from boot firmware might get removed by allocating the region dynamically.
+ * So, keep it compatible with older devices, we can keep the current limit for
+ * Linux to 201.
+ */
+#define MAX_NUM_ENTRIES 201
+#define MAX_STRTBL_SIZE (MAX_NUM_ENTRIES * MAX_REGION_NAME_LENGTH)
+
Should not we receive these constraints from backend?
+/*
+ * md_lock protects "md" during calls to qcom_minidump_backend_register(),
+ * qcom_minidump_backend_unregister().
+ */
+static DEFINE_MUTEX(md_lock);
+
+/* Only one front end will be attached to one back-end */
+static struct minidump *md;
+static char *md_backend;
+
Can you explain this a bit more? There is just one fronend, correct?
Multiple possibilites of backend.
Sorry, for misleading with the comment.
Was trying to be explicit here that only one backend can be attached to
the front end ..
Will fix the confusion.
Is it a limitation at the moment that we support only one backend or
plan to support more backends later for the same frontend. Pls clarify.
+static struct elf_shdr *elf_shdr_entry_addr(struct elfhdr *ehdr, int idx)
+{
+ struct elf_shdr *eshdr = (struct elf_shdr *)((size_t)ehdr + ehdr->e_shoff);
+
+ return &eshdr[idx];
+}
+
[...]
+/**
+ * qcom_minidump_region_register() - Register region in APSS Minidump table.
+ * @region: minidump region.
+ *
+ * Return: On success, it returns 0 and negative error value on failure.
+ */
Are we not going to return any cookie upon success which can be passed
to us during unregistration?
e.g, Let's just say if we return the region number as an cookies, the
problem i see that, we multiple unregistration is happening we will
be shifting the array upwards and that results in the index/cookies does
not retain the same for the shifted regions.
So, at the moment, client need to pass the same region for
un-registration as they have passed for registration..
+int qcom_minidump_region_register(const struct qcom_minidump_region *region)
+{
+ int ret;
+
+ if (!qcom_minidump_valid_region(region))
+ return -EINVAL;
+
+ mutex_lock(&md_lock);
+ if (!md) {
+ mutex_unlock(&md_lock);
+ pr_err("No backend registered yet, try again..");
+ return -EPROBE_DEFER;
+ }
+
+ ret = md->ops->md_region_register(md, region);
+ if (ret)
+ goto unlock;
+
The md_lock description in the beginning does not seems to be correct.
It is not limited to backend registration. It has much wider usage.
You are holding the lock while calling into backend. Basically the
synchronization is done in the front end.
Initially, the thought was to have the backend their own lock but that
is irrelevant as all the contention is there in the front end.
So, used the same lock for synchronization as i moved in developement
and the later that comment became obsolete.
Thanks for catching this.
will fix the comment.
+ qcom_md_update_elf_header(region);
+unlock:
+ mutex_unlock(&md_lock);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(qcom_minidump_region_register);
+
+/**
+ * qcom_minidump_region_unregister() - Unregister region from APSS Minidump table.
+ * @region: minidump region.
+ *
+ * Return: On success, it returns 0 and negative error value on failure.
+ */
+int qcom_minidump_region_unregister(const struct qcom_minidump_region *region)
+{
+ int ret;
+
+ if (!qcom_minidump_valid_region(region))
+ return -EINVAL;
+
+ mutex_lock(&md_lock);
+ if (!md) {
+ mutex_unlock(&md_lock);
+ pr_err("No backend registered yet, try again..");
+ return -EPROBE_DEFER;
+ }
+
+ ret = md->ops->md_region_unregister(md, region);
+ if (ret)
+ goto unlock;
+
The frontend is not validing that it is actually a registered client, it
is left to backend. Seems that is more duplication in the backend(s).
For that front-end need to cache each registered entry and that is exact
copy at the backend, will need extra memory for copied data, so, that's
the reason of this behavior.
+ ret = qcom_minidump_clear_header(region);
+unlock:
+ mutex_unlock(&md_lock);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(qcom_minidump_region_unregister);
can we create a namespace for exporting these symbols?
Any specific reason, you are suggesting this ?
+
+static int qcom_minidump_add_elf_header(struct minidump *md_data)
+{
[...]
+
+/**
+ * qcom_minidump_backend_register() - Register backend minidump device.
+ * @md_data: minidump backend driver data
+ *
+ * Return: On success, it returns 0 and negative error value on failure.
+ */
+int qcom_minidump_backend_register(struct minidump *md_data)
+{
+ int ret;
+
+ if (!md_data->name || !md_data->dev ||
+ !md_data->ops ||
+ !md_data->ops->md_table_init ||
+ !md_data->ops->md_region_register ||
+ !md_data->ops->md_region_unregister ||
+ !md_data->ops->md_table_exit) {
+ pr_warn("backend '%s' must fill/implement necessary fields\n", md->name);
+ return -EINVAL;
+ }
+
+ if (md_backend && strcmp(md_backend, md_data->name)) {
+ pr_warn("backend '%s' already in use: ignoring '%s'\n",
+ md_backend, md_data->name);
+ return -EBUSY;
+ }
+
+ mutex_lock(&md_lock);
+ if (md) {
+ dev_warn(md->dev, "backend '%s' already loaded: ignoring '%s'\n",
+ md->name, md_data->name);
+ ret = -EBUSY;
+ goto unlock;
+ }
+
+ if (!md_data->max_region_limit || md_data->max_region_limit > MAX_NUM_ENTRIES)
+ md_data->max_region_limit = MAX_NUM_ENTRIES;
+
+ ret = md_data->ops->md_table_init(md_data);
+ if (ret) {
+ dev_err(md_data->dev, "minidump backend initialization failed: %d\n", ret);
+ goto unlock;
+ }
+
+ /* First entry would be ELF header */
+ ret = qcom_minidump_add_elf_header(md_data);
+ if (ret) {
+ dev_err(md_data->dev, "Failed to add elf header: %d\n", ret);
+ md_data->ops->md_table_exit(md_data);
+ goto unlock;
+ }
+
+ md = md_data;
+ md_backend = kstrdup(md->name, GFP_KERNEL);
+ dev_info(md->dev, "Registered minidump backend : %s\n", md->name);
+
What is the need for keeping md_backend separately? md::name is already
present.
It looks redundant, this is one of cleanup, missed, thanks..
Also, how do we prevent backend module unloading while it is inuse? or
we don't need that constraint?
md_lock is there for that..
In next patch series, i am going to make all of this as static driver,
so that constraints will not be there ?
-Mukesh
+unlock:
+ mutex_unlock(&md_lock);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(qcom_minidump_backend_register);
Thanks,
Pavan