Re: [PATCH 2/4] lib: add error_report_notify to collect debugging tools' reports

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 13 Jan 2021 10:16:55 +0100 Alexander Potapenko <glider@xxxxxxxxxx> wrote:

> With the introduction of various production error-detection tools, such as
> MTE-based KASAN and KFENCE, the need arises to efficiently notify the
> userspace OS components about kernel errors. Currently, no facility exists
> to notify userspace about a kernel error from such bug-detection tools.
> The problem is obviously not restricted to the above bug detection tools,
> and applies to any error reporting mechanism that does not panic the
> kernel; this series, however, will only add support for KASAN and KFENCE
> reporting.
> 
> All such error reports appear in the kernel log. But, when such errors
> occur, userspace would normally need to read the entire kernel log and
> parse the relevant errors. This is error prone and inefficient, as
> userspace needs to continuously monitor the kernel log for error messages.
> On certain devices, this is unfortunately not acceptable. Therefore, we
> need to revisit how reports are propagated to userspace.
> 
> The library added, error_report_notify (CONFIG_ERROR_REPORT_NOTIFY),
> solves the above by using the error_report_start/error_report_end tracing
> events and exposing the last report and the total report count to the
> userspace via /sys/kernel/error_report/last_report and
> /sys/kernel/error_report/report_count.
> 
> Userspace apps can call poll(POLLPRI) on those files to get notified about
> the new reports without having to watch dmesg in a loop.

It would be nice to see some user-facing documentation for this, under
Documentation/.  How to use it, what the shortcomings are, etc.

For instance...  what happens when userspace is slow reading
/sys/kernel/error_report/last_report?  Does that file buffer multiple
reports?  Does the previous one get overwritten?  etc.  Words on how
this obvious issue is handled...

> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -209,6 +209,20 @@ config DEBUG_BUGVERBOSE
>  	  of the BUG call as well as the EIP and oops trace.  This aids
>  	  debugging but costs about 70-100K of memory.
>  
> +config ERROR_REPORT_NOTIFY
> +	bool "Expose memory error reports to the userspace"

There's really nothing "memory" specific about this?  Any kernel
subsystem could use it?

> +	depends on TRACING
> +	help
> +	  When enabled, captures error reports from debugging tools (such as
> +	  KFENCE or KASAN) using console tracing, and exposes reports in
> +	  /sys/kernel/error_report/: the file last_report contains the last
> +	  report (with maximum report length of PAGE_SIZE), and report_count,
> +	  the total report count.
> +
> +	  Userspace programs can call poll(POLLPRI) on those files to get
> +	  notified about the new reports without having to watch dmesg in a
> +	  loop.

So we have a whole new way of getting debug info out of the kernel.  I
fear this will become a monster.  And anticipating that, we should make
darn sure that the interface is right, and is extensible.






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux