Add the qualcomm minidump guide for the users which tries to cover the dependency, API use and the way to test and collect minidump on Qualcomm supported SoCs. Signed-off-by: Mukesh Ojha <quic_mojha@xxxxxxxxxxx> Reviewed-by: Bagas Sanjaya <bagasdotme@xxxxxxxxx> --- Documentation/admin-guide/index.rst | 1 + Documentation/admin-guide/qcom_minidump.rst | 318 ++++++++++++++++++++ 2 files changed, 319 insertions(+) create mode 100644 Documentation/admin-guide/qcom_minidump.rst diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst index fb40a1f6f79e..edab05fc4653 100644 --- a/Documentation/admin-guide/index.rst +++ b/Documentation/admin-guide/index.rst @@ -121,6 +121,7 @@ configure specific aspects of kernel behavior to your liking. pm/index pmf pnp + qcom_minidump rapidio ras rtc diff --git a/Documentation/admin-guide/qcom_minidump.rst b/Documentation/admin-guide/qcom_minidump.rst new file mode 100644 index 000000000000..2a057612422b --- /dev/null +++ b/Documentation/admin-guide/qcom_minidump.rst @@ -0,0 +1,318 @@ +Qualcomm Minidump Feature +========================= + +Introduction +------------ + +Minidump is a best effort mechanism to collect useful and predefined +data for post-mortem debugging on a Qualcomm System on chip(SoCs). + +Minidump is built on the premise that a hardware or software component +on the SoC has encountered an unexpected fault. This means that data +collected by Minidump can not be assumed to be correct or Minidump +collection itself could fail. + +Qualcomm SoCs in engineering mode provides mechanism for generating +complete RAM dump for both kernel/non-kernel crashes for postmortem +debugging however, on a end user product taking complete RAM dump at +the time of failure has substantial storage requirement as well as it +is time consuming to transfer them electronically. To encounter this +problem, Minidump was introduced in Qualcomm boot firmware that provides +a way to collect selected region in the final RAM dump which is less +in size compared to complete RAM dump. + +Qualcomm SoCs contains Application Processor SubSystem(APSS) and its +co-processor like Audio Digital Signal Process(ADSP), Compute DSP(CDSP), +MODEM running their operating system or firmware can register their +selected region in their respective table called SubSystem table of +content (SS-ToC) and the addresses of these tables is further maintained +in a separate table called Global Table of Content (G-ToC) inside separate +region maintaied inside RAM called Shared memory(SMEM). More about shared +memory can be found inside ``driver/soc/qcom/smem.c`` under doc section +and it is briefly discussed in later section. + +It is to note that SubSystems, Remote processors and co-processors have +same meaning in this document and been used interchangeably. + +High level design +----------------- +:: + + +-----------------------------------------------+ + | RAM +-------------+ | + | | SS0-ToC| | + | +----------------+ +----------------+ | | + | |Shared memory | | SS1-ToC| | | + | |(SMEM) | | | | | + | | | +-->|--------+ | | | + | |G-ToC | | | SS-ToC \ | | | + | |+-------------+ | | | +-----------+ | | | + | ||-------------| | | | |-----------| | | | + | || SS0-ToC | | | +-|<|SS1 region1| | | | + | ||-------------| | | | | |-----------| | | | + | || SS1-ToC |-|>+ | | |SS1 region2| | | | + | ||-------------| | | | |-----------| | | | + | || SS2-ToC | | | | | ... | | | | + | ||-------------| | | | |-----------| | | | + | || ... | | |-|<|SS1 regionN| | | | + | ||-------------| | | | |-----------| | | | + | || SSn-ToC | | | | +-----------+ | | | + | |+-------------+ | | | | | | + | | | | |----------------| | | + | | | +>| regionN | | | + | | | | |----------------| | | + | +----------------+ | | | | | + | | |----------------| | | + | +>| region1 | | | + | |----------------| | | + | | | | | + | |----------------|-+ | + | | region5 | | + | |----------------| | + | | | | + | Region information +----------------+ | + | +---------------+ | + | |region name | | + | |---------------| | + | |region address | | + | |---------------| | + | |region size | | + | +---------------+ | + +-----------------------------------------------+ + +G-ToC: Global table of contents +SSX-ToC: SubSystem X table of contents + X is an integer in the range of 0 to 10 + Older boot firmware has kept this limit to 10 + however, in newer firmware this number is expected to change + +SSX-MSn: SubSystem memory segments numbered from 0 to n + For APSS, n is limited to 200 from older boot firmware + + Older boot firmware statically allocates 300 as total number of + supported region across all SubSystem in Minidump table out of + which, APSS limit is kept to 201. In future, this limitation + from boot firmware might get removed by allocating the region + dynamically. APSS Minidump kernel driver keeping this limit to + 201 to be compatible with older boot firmware. + +SMEM is a section of RAM reserved by boot firmware and is the backbone of +Minidump functionality to work. It is also a medium of inter processor +communication and a way where boot firmware can prepare something for +upcoming operating system usage. + +Qualcomm SoCs boot firmware must reserve an area of RAM as SMEM prior to +handling over control to the run-time operating system. It creates SMEM +partition for Minidump with ``SBL_MINIDUMP_SMEM_ID`` and creates an array +of pointers called Global table of content (G-ToC) at the start of this +partition. Each index of this array is uniquely assigned to each SubSystem +like for APSS it is 0 while for ADSP, CDSP, MODEM it is 5, 7 and 3 respectively. +points to their table of segments called SS-ToC to be included in the Minidump. + +From the diagram above, Global Table of Contents (G-ToC) enumerates a fixed +size number of SubSystem Table of Contents (SS-ToC) structures. Each +SS-ToC contains a list of SubSystem Memory Segments which are named +according to the containing SS-ToC hence, SSX-MSn where "X" denotes the +SubSystem index of the containing SSX-ToC and "n" denotes an individual +Memory segment within the SubSystem. Hence, SS0-MS0 belongs to SS0-ToC +whereas SS1-MS0 belongs to SS1-ToC. Segment structure contains name, +base address, size of a Segment to be dumped. + +The Application Processor SubSystem (APSS) runs the Linux kernel and is +therefore not responsible for assembling Minidump data. One of the other +system agents in the SoC will be responsible for capturing the Minidump +data during system reset. Typically one of the SoC Digital Signal +Processors (DSP) will be used for this purpose. During reset, the DSP will +walk the G-ToC, SSX-ToCs and SSX-MSns either., dump the regions as binary +blob into storage or pushed outside to the attached host machine via USB +(more described in Dump collection section below). + +Qualcomm Remote Processor Minidump support +------------------------------------------ + +Linux Kernel support recovery and coredump collection on remote processor +failure through remoteproc framework and in this document, remote processors +meant for ADSP, CDSP, MODEM etc. Qualcomm remoteproc driver has support for +collecting Minidump for remote processors as well where each remote processor +has their unique statically assigned descriptor in the G-ToC which is +represented via ``minidump_id`` in ``driver/remoteproc/qcom_q6v5_pas.c`` +and it helps getting further information about valid registered region from +firmware and later collecting via remoteproc coredump framework. + +Qualcomm APSS Minidump kernel driver concept +-------------------------------------------- + +Qualcomm APSS Minidump kernel driver adds the capability to add Linux +region to be dumped as part of Minidump collection. Shared memory +driver creates platform device for Minidump driver and on Minidump +driver probe it gets the G-ToC address (``struct minidump_global_toc``) +by querying Minidump SMEM ID ``SBL_MINIDUMP_SMEM_ID`` as one of parameter +to ``qcom_smem_get`` function. Further, driver uses APSS Minidump unique +descriptor or index i.e., 0 to get APSS SubSystem ToC and fills up the +fields of ``struct minidump_subsystem`` and allocates memory for Segment +array of structure ``struct minidump_region`` of size compatible with +boot firmware (default size is 201). This really means that total 201 +APSS regions can be registered for APSS alone and the Minidump kernel +driver provides ``qcom_minidump_region_register`` and +``qcom_minidump_region_unregister`` function to register and unregister +APSS minidump region. Example usage explained in later section. + +To simplify post-mortem debugging, APSS driver registers the first region +as an ELF header that gets updated each time a new region gets registered. +and rest 200 region can be used by other APSS Minidump driver client. + +The solution supports extracting the Minidump produced either over USB +or stored to an attached storage device, if not configured default mode +is USB more described in Dump collection section. + +How a kernel client driver can register region with minidump +------------------------------------------------------------ + +A client driver can use ``qcom_minidump_region_register`` API's to register +and ``qcom_minidump_region_unregister`` to unregister their region from +minidump driver. + +A client needs to fill their region by filling ``qcom_minidump_region`` +structure object which consists of the region name, region's virtual +and physical address and its size. + + .. code-block:: c + + #include <soc/qcom/qcom_minidump.h> + [...] + + + [... inside a function ...] + struct qcom_minidump_region region; + + [...] + + client_mem_region = kzalloc(region_size, GFP_KERNEL); + if (!client_mem_region) + return -ENOMEM; + + [... Just write a pattern ...] + memset(client_mem_region, 0xAB, region_size); + + [... Fill up the region object ...] + strlcpy(region.name, "REGION_A", sizeof(region.name)); + region.virt_addr = client_mem_region; + region.phys_addr = virt_to_phys(client_mem_region); + region.size = region_size; + + ret = qcom_minidump_region_register(®ion); + if (ret < 0) { + pr_err("failed to add region in minidump: err: %d\n", ret); + return ret; + } + + [...] + + +Testing +------- + +Existing Qualcomm SoCs already supports collecting complete RAM dump (also +called full dump) can be configured by writing appropriate value to Qualcomm's +top control and status register (tcsr) in ``driver/firmware/qcom_scm.c``. +Complete RAM dump on system failure is where entire RAM snapshot is pushed out +to Host computer attached to SoC via USB similar to one of the way will be +used for Minidump described later in Dump collection section. Complete RAM +dump entirely get controlled from Qualcomm boot firmware and is not related +to Minidump or SMEM except the fact that same register is used to configure +one of the mode. + +SCM device Tree bindings required to support download mode +For example (sm8450) :: + + / { + + [...] + + firmware { + scm: scm { + compatible = "qcom,scm-sm8450", "qcom,scm"; + [... tcsr register ... ] + qcom,dload-mode = <&tcsr 0x13000>; + + [...] + }; + }; + + [...] + + soc: soc@0 { + + [...] + + tcsr: syscon@1fc0000 { + compatible = "qcom,sm8450-tcsr", "syscon"; + reg = <0x0 0x1fc0000 0x0 0x30000>; + }; + + [...] + }; + [...] + + }; + +A kernel command line parameter is provided to facilitate selection of +dump mode also called download mode. Boot firmware configures download +mode to be full dump even before Linux boots up however, one need to pass +``qcom_scm.download_mode="mini"`` to switch the default download mode +to Minidump. Similarly ``"full"`` need to be passed to set the download +mode to full dump and passing ``"full,mini"`` will set the download mode +where both Minidump along with fulldump will be collected on system failure +however, this mode will only work if dump need to collected via USB more +about this described in Dump collection section. + +Writing to sysfs node can also be used to set the mode to minidump:: + + echo "mini" > /sys/module/qcom_scm/parameter/download_mode + +Once the download mode is set, any kind of crash will make the device collect +respective dump as per the set download mode. + +Dump collection +--------------- +:: + + +-----------+ + | | + | | +------+ + | | | | + | | +--+---+ Product(Qualcomm SoC) + +-----------+ | + |+++++++++++|<------------+ + |+++++++++++| usb cable + +-----------+ + x86_64 PC + +The solution supports a product running with Qualcomm SoC (where minidump) +is supported from the firmware) connected to x86_64 host PC running PCAT +tool. It supports downloading the minidump produced from product to the +host PC over USB or to save the minidump to the product attached storage +device(UFS/eMMC/SD Card) into minidump dedicated partition. + +By default, dumps are downloaded via USB to the attached x86_64 PC running +PCAT (Qualcomm tool) software. Upon download, we will see a set of binary +blobs starting with name ``md_*`` in PCAT configured directory in x86_64 +machine, so for above example from the client it will be ``md_REGION_A.BIN``. +This binary blob depends on region content to determine whether it needs +external parser support to get the content of the region, so for simple +plain ASCII text we don't need any parsing and the content can be seen +just opening the binary file. + +To collect the dump to attached storage type, one needs to write appropriate +value to IMEM register, in that case dumps are collected in rawdump +partition on the product device itself. + +One needs to read the entire rawdump partition and pull out content to +save it onto the attached x86_64 machine over USB. Later, this rawdump +can be passed to another tool (``dexter.exe`` [Qualcomm tool]) which +converts this into the similar binary blobs which we have got it when +download type was set to USB, i.e. a set of registered regions as blobs +and their name starts with ``md_*``. + +Replacing the ``dexter.exe`` with some open source tool can be added as future +scope of this document. -- 2.43.0.254.ga26002b62827