On 11/13/24 3:17 AM, Sebastian Fricke wrote: > This idea was formed after noticing that new developers experience > certain difficulty to navigate within the multitude of different > debugging options in the Kernel and while there often is good > documentation for the tools, the developer has to know first that they > exist and where to find them. > Add a general debugging section to the Kernel documentation, as an > easily locatable entry point to other documentation and as a general > guideline for the topic. > > Signed-off-by: Sebastian Fricke <sebastian.fricke@xxxxxxxxxxxxx> > --- > .../driver_development_debugging_guide.rst | 214 ++++++++++++++++ > Documentation/process/debugging/index.rst | 65 +++++ > .../debugging/userspace_debugging_guide.rst | 278 +++++++++++++++++++++ > Documentation/process/index.rst | 8 +- > 4 files changed, 562 insertions(+), 3 deletions(-) > > diff --git a/Documentation/process/debugging/driver_development_debugging_guide.rst b/Documentation/process/debugging/driver_development_debugging_guide.rst > new file mode 100644 > index 000000000000..bfeefb242b03 > --- /dev/null > +++ b/Documentation/process/debugging/driver_development_debugging_guide.rst > @@ -0,0 +1,214 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +======================================== > +Debugging advice for driver development > +======================================== > + > +This document serves as a general starting point and lookup for debugging device > +drivers. > +While this guide focuses on debugging that requires re-compiling the > +module/kernel, the :doc:`userspace debugging guide > +</process/debugging/userspace_debugging_guide>` will guide > +you through tools like dynamic debug, ftrace and other tools useful for > +debugging issues and behavior. > +For general debugging advice, see the :doc:`general advice document > +</process/debugging/index>`. > + > +.. contents:: > + :depth: 3 > + > +The following sections show you the available tools. > + > +printk() & friends > +------------------ > + > +These are derivatives of printf() with varying destinations and support for > +being dynamically turned on or off, or lack thereof. > + > +Simple printk() > +~~~~~~~~~~~~~~~ > + > +The classic, can be used to great effect for quick and dirty development > +of new modules or to extract arbitrary necessary data for troubleshooting. > + > +Prerequisite: ``CONFIG_PRINTK`` (usually enabled by default) > + > +**Pros**: > + > +- No need to learn anything, simple to use > +- Easy to modify exactly to your needs (formatting of the data (See: > + :doc:`/core-api/printk-formats`), visibility in the log) > +- Can cause delays in the execution of the code (beneficial to confirm whether > + timing is a factor) > + > +**Cons**: > + > +- Requires rebuilding the kernel/module > +- Can cause delays in the execution of the code (which can cause issues to be > + not reproducible) > + > +For the full documentation see :doc:`/core-api/printk-basics` > + > +Trace_printk > +~~~~~~~~~~~~ > + > +Prerequisite: ``CONFIG_DYNAMIC_FTRACE`` & ``#include <linux/ftrace.h>`` > + > +It is a tiny bit less comfortable to use than printk(), because you will have > +to read the messages from the trace file (See: :ref:`read_ftrace_log` > +instead of from the kernel log, but very useful when printk() adds unwanted > +delays into the code execution, causing issues to be flaky or hidden.) > + > +If the processing of this still causes timing issues then you can try > +trace_puts(). > + > +For the full Documentation see trace_printk() > + > +dev_dbg > +~~~~~~~ > + > +Print statement, which can be targeted by > +:ref:`process/debugging/userspace_debugging_guide:dynamic debug` that contains > +additional information about the device used within the context. > + > +**When is it appropriate to leave a debug print in the code?** > + > +Permanent debug statements have to be useful for a developer to troubleshoot > +driver misbehavior. Judging that is a bit more of an art than a science, but > +some guidelines are in the :ref:`Coding style guidelines > +<process/coding-style:13) printing kernel messages>`. In almost all cases the > +debug statements shouldn't be upstreamed, as a working driver is supposed to be > +silent. > + > +Custom printk > +~~~~~~~~~~~~~ > + > +Example:: > + > + #define core_dbg(fmt, arg...) do { \ > + if (core_debug) \ > + printk(KERN_DEBUG pr_fmt("core: " fmt), ## arg); \ > + } while (0) > + > +**When should you do this?** > + > +It is better to just use a pr_debug(), which can later be turned on/off with > +dynamic debug. Additionally, a lot of drivers activate these prints via a > +variable like ``core_debug`` set by a module parameter. However, Module > +parameters `are not recommended anymore > +<https://lore.kernel.org/all/2024032757-surcharge-grime-d3dd@gregkh>`_. > + > +Ftrace > +------ > + > +Creating a custom Ftrace tracepoint > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +A tracepoint adds a hook into your code that will be called and logged when the > +tracepoint is enabled. This can be used, for example, to trace hitting a > +conditional branch or to dump the internal state at specific points of the code > +flow during a debugging session. > + > +Here is a basic description of :ref:`how to implement new tracepoints > +<trace/tracepoints:usage>`. > + > +For the full event tracing documentation see :doc:`/trace/events` > + > +For the full Ftrace documentation see :doc:`/trace/ftrace` > + > +DebugFS > +------- > + > +Prerequisite: ``CONFIG_DEBUG_FS` & `#include <linux/debugfs.h>`` > + > +DebugFS differs from the other approaches of debugging, as it doesn't write > +messages to the kernel log nor add traces to the code. Instead it allows the > +developer to handle a set of files. > +With these files you can either store values of variables or make > +register/memory dumps or you can make these files writable and modify > +values/settings in the driver. > + > +Possible use-cases among others: > + > +- Store register values > +- Keep track of variables > +- Store errors > +- Store settings > +- Toggle a setting like debug on/off > +- Error injection > + > +This is especially useful, when the size of a data dump would be hard to digest > +as part of the general kernel log (for example when dumping raw bitstream data) > +or when you are not interested in all the values all the time, but with the > +possibility to inspect them. > + > +The general idea is: > + > +- Create a directory during probe (``struct dentry *parent = > + debugfs_create_dir("my_driver", NULL);``) > +- Create a file (``debugfs_create_u32("my_value", 444, parent, &my_variable);``) > + > + - In this example the file is found in ``/sys/kernel/debug/my_driver/my_value`` > + (with read permissions for user/group/all) > + - any read of the file will return the current contents of the variable > + ``my_variable`` > + > +- Clean up the folder when removing the device > + (``debugfs_remove_recursive(parent);``) > + > +For the full documentation see :doc:`/filesystems/debugfs`. > + > +KASAN, UBSAN, lockdep and other error checkers > +---------------------------------------------- > + > +KASAN (Kernel Address Sanitizer) > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +Prerequisite: ``CONFIG_KASAN`` > + > +KASAN is a dynamic memory error detector that helps to find use-after-free and > +out-of-bounds bugs. It uses compile-time instrumentation to check every memory > +access. > + > +For the full documentation see :doc:`/dev-tools/kasan`. > + > +UBSAN (Undefined Behavior Sanitizer) > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +Prerequisite: ``CONFIG_UBSAN`` > + > +UBSAN relies on compiler instrumentation and runtime checks to detect undefined > +behavior. It is designed to find a variety of issues, including signed integer > +overflow, array index out of bounds, and more. > + > +For the full documentation see :doc:`/dev-tools/ubsan` > + > +lockdep (Lock Dependency Validator) > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +Prerequisite: ``CONFIG_DEBUG_LOCKDEP`` > + > +lockdep is a runtime lock dependency validator that detects potential deadlocks > +and other locking-related issues in the kernel. > +It tracks lock acquisitions and releases, building a dependency graph that is > +analyzed for potential deadlocks. > +lockdep is especially useful for validating the correctness of lock ordering in > +the kernel. > + > +device coredump > +--------------- > + > +Prerequisite: ``#include <linux/devcoredump.h>`` > + > +Provides the infrastructure for a driver to provide arbitrary data to userland. > +It is most often used in conjunction with udev or similar userland application > +to listen for kernel uevents, which indicate that the dump is ready. Udev has > +rules to copy that file somewhere for long-term storage and analysis, as by > +default, the data for the dump is automatically cleaned up after 5 minutes. > +That data is analyzed with driver-specific tools or GDB. > + > +You can find an example implementation at: > +`drivers/media/platform/qcom/venus/core.c > +<https://elixir.bootlin.com/linux/v6.11.6/source/drivers/media/platform/qcom/venus/core.c#L30>`__ > + > +**Copyright** ©2024 : Collabora > diff --git a/Documentation/process/debugging/index.rst b/Documentation/process/debugging/index.rst > new file mode 100644 > index 000000000000..77ec17d1c119 > --- /dev/null > +++ b/Documentation/process/debugging/index.rst > @@ -0,0 +1,65 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +============================================ > +Debugging advice for Linux Kernel developers > +============================================ > + > +.. toctree:: > + :maxdepth: 1 > + > + driver_development_debugging_guide > + userspace_debugging_guide > + > +.. only:: subproject and html > + > + Indices > + ======= > + > + * :ref:`genindex` > + > +General debugging advice > +======================== > + > +Depending on the issue, a different set of tools is available to track down the > +problem or even to realize whether there is one in the first place. > + > +As a first step you have to figure out what kind of issue you want to debug. > +Depending on the answer, your methodology and choice of tools may vary. > + > +Do I need to debug with limited access? > +--------------------------------------- > + > +Do you have limited access to the machine or are you unable to stop the running > +execution? > + > +In this case your debugging capability depends on built-in debugging support of > +provided distro kernel. preferable s/distro/distribution/ > +The :doc:`/process/debugging/userspace_debugging_guide` provides a brief > +overview over range of possible debugging tools in that situation. You can over {a | the} range > +check the capability of your kernel, in most cases, by looking into config file > +within the /boot folder. > + > +Do I have root access to the system? > +------------------------------------ > + > +Are you easily able to replace the module in question or to install a new > +kernel? > + > +In that case your range of available tools is a lot bigger, you can find the > +tools in the :doc:`/process/debugging/driver_development_debugging_guide`. > + > +Is timing a factor? > +------------------- > + > +It is important to understand if the problem you want to debug manifests itself > +consistently (i.e. given a set of inputs you always get the same, incorrect > +output), or inconsistently. If it manifests itself inconsistently, some timing > +factor might be at play. If inserting delays into the code does change the > +behavior, then quite likely timing is a factor. > + > +When timing does alter the outcome of the code execution using a simple > +printk() for debugging purposes may not work, a similar alternative is to use work. A > +trace_printk() , which logs the debug messages to the trace file instead of the > +kernel log. > + > +**Copyright** ©2024 : Collabora [userspace guide reviewed separately] I see 3 uses of "folder" here. Directory or sub-directory is much preferred IMO. Thanks. -- ~Randy