Re: [PATCH v4] drm/trace: Buffer DRM logs in a ringbuffer accessible via debugfs

Sean Paul <sean@xxxxxxxxxx> · Wed, 15 Jan 2020 08:31:36 -0500

On Wed, Jan 15, 2020 at 12:14:50PM +0200, Pekka Paalanen wrote:
> On Tue, 14 Jan 2020 12:21:43 -0500
> Sean Paul <sean@xxxxxxxxxx> wrote:
> 
> > From: Sean Paul <seanpaul@xxxxxxxxxxxx>
> > 
> > This patch uses a ring_buffer to keep a "flight recorder" (name credit Weston)
> > of DRM logs for a specified set of debug categories. The user writes a
> > bitmask of debug categories to the "trace_mask" node and can read log
> > messages from the "trace" node.
> > 
> > These nodes currently exist in debugfs under the dri directory. I
> > intended on exposing all of this through tracefs originally, but the
> > tracefs entry points are not exposed, so there's no way to create
> > tracefs files from drivers at the moment. I think it would be a
> > worthwhile endeavour, but one requiring more time and conversation to
> > ensure the drm traces fit somewhere sensible.
> > 
> > Cc: Daniel Vetter <daniel.vetter@xxxxxxxx>
> > Cc: David Airlie <airlied@xxxxxxxxx>
> > Cc: Jani Nikula <jani.nikula@xxxxxxxxxxxxxxx>
> > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>
> > Cc: Pekka Paalanen <ppaalanen@xxxxxxxxx>
> > Cc: Rob Clark <robdclark@xxxxxxxxx>
> > Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
> > Cc: Thomas Zimmermann <tzimmermann@xxxxxxx>
> > Cc: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>
> > Signed-off-by: Sean Paul <seanpaul@xxxxxxxxxxxx>
> > Link: https://patchwork.freedesktop.org/patch/msgid/20191010204823.195540-1-sean@xxxxxxxxxx #v1
> > Link: https://lists.freedesktop.org/archives/dri-devel/2019-November/243230.html #v2
> > Link: https://patchwork.freedesktop.org/patch/msgid/20191212203301.142437-1-sean@xxxxxxxxxx #v3
> > 
> > Changes in v2:
> > - Went with a completely different approach:
> > https://lists.freedesktop.org/archives/dri-devel/2019-November/243230.html
> > 
> > Changes in v3:
> > - Changed commit message to be a bit less RFC-y
> > - Make class_drm_category_log an actual trace class
> > 
> > Changes in v4:
> > - Instead of [ab]using trace events and the system trace buffer, use our
> >   own ringbuffer
> > ---
> > ---
> >  Documentation/gpu/drm-uapi.rst |   9 +
> >  drivers/gpu/drm/Kconfig        |   1 +
> >  drivers/gpu/drm/Makefile       |   2 +-
> >  drivers/gpu/drm/drm_drv.c      |   3 +
> >  drivers/gpu/drm/drm_print.c    |  80 +++++--
> >  drivers/gpu/drm/drm_trace.c    | 376 +++++++++++++++++++++++++++++++++
> >  include/drm/drm_print.h        |  39 ++++
> >  7 files changed, 487 insertions(+), 23 deletions(-)
> >  create mode 100644 drivers/gpu/drm/drm_trace.c
> 
> ...
> 
> > +/**
> > + * DOC: DRM Tracing
> > + *
> > + * *tl;dr* DRM tracing is a lightweight alternative to traditional DRM debug
> > + * logging.
> > + *
> > + * While DRM logging is quite convenient when reproducing a specific issue, it
> > + * doesn't help when something goes wrong unexpectedly. There are a couple
> > + * reasons why one does not want to enable DRM logging at all times:
> > + *
> > + * 1. We don't want to overwhelm syslog with drm spam, others have to use it too
> > + * 2. Console logging is slow
> > + *
> > + * DRM tracing aims to solve both these problems.
> > + *
> > + * To use DRM tracing, write a DRM debug category mask (this is a bitmask of
> > + * &drm_debug_category values) to the trace_mask file:
> > + * ::
> > + *
> > + *    eg: echo 0x106 > /sys/kernel/debug/dri/trace_mask
> > + *
> > + * Once active, all log messages in the specified categories will be written to
> > + * the DRM trace. Once at capacity, the trace will overwrite old messages with
> > + * new ones. At any point, one can read the trace file to extract the previous N
> > + * DRM messages:
> > + * ::
> > + *
> > + *    eg: cat /sys/kernel/debug/dri/trace
> > + *
> > + * Considerations
> > + * **************
> > + * The contents of the DRM Trace are **not** considered UABI. **DO NOT depend on
> > + * the values of these traces in your userspace.** These traces are intended for
> > + * entertainment purposes only. The contents of these logs carry no warranty,
> > + * expressed or implied.
> > + *
> > + * New traces can not be added to the trace buffer while it is being read. If
> > + * this proves to be a problem, it can be mitigated by making a copy of the
> > + * buffer on start of read. Since DRM trace is not meant to be continuously
> > + * read, this loss is acceptable.
> > + *
> > + * The timestamps on logs are CPU-local. As such, log messages from different
> > + * CPUs may have slightly different ideas about time.
> > + *
> > + * Since each CPU has its own buffer, they won't all overflow at the same rate.
> > + * This means that messages from a particularly active CPU could be dropped
> > + * while an inactive CPU might have much older log messages. So don't be fooled
> > + * if you seem to be missing log messages when you see a switch between CPUs in
> > + * the logs.
> > + *
> > + * Internals
> > + * *********
> > + * The DRM Tracing functions are intentionally unexported, they are not meant to
> > + * be used by drivers directly. The reasons are twofold:
> > + *
> > + * 1. All messages going to traces should also go to the console logs. This
> > + *    ensures users can choose their logging medium without fear they're losing
> > + *    messages.
> > + * 2. Writing directly to the trace skips category filtering, resulting in trace
> > + *    spam.
> > + */
> 
> Hi,
> 
> sounds like a good first step to me!
> 
> I still have concerns about depending on debugfs in production and in
> desktop distributions when this feature is wanted to be on by default,
> but I suppose that cannot be solved right now.
> 
> ...
> 
> > +/**
> > + * drm_trace_init - initializes tracing for drm core
> > + * @debugfs_root: the dentry for drm core's debugfs root
> > + *
> > + * This function is called on drm core init. It is responsible for initializing
> > + * drm tracing. This function must be matched by a call to drm_trace_cleanup().
> > + *
> > + * Returns: 0 on success, -errno on failure
> > + */
> > +int drm_trace_init(struct dentry *debugfs_root)
> > +{
> > +	struct drm_trace_info *info = &drm_trace;
> > +	int ret;
> > +
> > +	info->buffer = ring_buffer_alloc(PAGE_SIZE * 2, RB_FL_OVERWRITE);
> 
> Does this mean the ring buffer size is hardcoded to two pages of log
> data (not event pointers)?
> 
> That is tiny! Does that even fit one frame's worth? And given that it
> may take userspace a bit to react and open the log, other DRM
> actions may have flushed everything interesting out already. I'm afraid
> there won't be a single number to fit all use cases, either, I guess.

2 pages per cpu, yes. ring_buffers can be resized on the fly, so it won't be too
hard to add a trace_size (Joonas asked for this too) node. I didn't add that now
since I wanted to prove out everything else before spending too much time on any
one solution (since this is the 3rd rewrite of drm tracing).

If we're all happy with the current direction, I'll add trace_size.

Thanks again for all the feedback,

Sean

> 
> 
> Thanks,
> pq

-- 
Sean Paul, Software Engineer, Google / Chromium OS