Re: [PATCH v2 1/3] drm/atomic: Add a generic infrastructure to track underrun errors

Daniel Vetter <daniel@xxxxxxxx> · Fri, 26 Oct 2018 15:36:15 +0200

On Fri, Oct 26, 2018 at 02:36:56PM +0200, Boris Brezillon wrote:
> Hi Daniel,
> 
> On Fri, 26 Oct 2018 12:33:37 +0200
> Daniel Vetter <daniel@xxxxxxxx> wrote:
> 
> > On Thu, Oct 25, 2018 at 02:45:44PM +0200, Boris Brezillon wrote:
> > > Display controllers usually provide a lot features like overlay planes,
> > > hardware scalers, hardware rotations, ...
> > > Most of the time those features work just fine when enabled
> > > independently, but activating all of them at the same time tend to
> > > show other limitations like the limited memory bus or scaler
> > > bandwidth.
> > > 
> > > This sort of bandwidth estimation is hard to get right, and we
> > > sometimes fail to predict that a specific workload will not pass,
> > > which will most likely result in underrun errors somewhere in the
> > > display pipeline (most of the time at the CRTC level).
> > > 
> > > This path aims at making underrun error tracking generic and exposing
> > > underrun errors to userspace through a debugfs file.
> > > 
> > > This way, CI tools like IGT, can try to enable a bunch of features and
> > > if such underrun errors are reported after the settings have been
> > > accepted and applied by the KMS infrastructure. When that happens it
> > > means the load tracking algorithm needs some tweaking/fixing.
> > > 
> > > Signed-off-by: Boris Brezillon <boris.brezillon@xxxxxxxxxxx>
> > > ---
> > > Changes in v2:
> > > - New patch  
> > 
> > Hm, not what I thought off. This is definitely not going to be reusable
> > for i915, the only other driver with underrun reporting, since we have
> > tons of underrun sources (not just one). And we need to mask/unmask them
> > individually.
> 
> I agree it's not perfect, but I did it this way so that hopefully we'll
> be able to use drm_atomic_helper_commit_tail() at some point instead of
> having our own implementation.

I expect every non-trivial driver to need to overwrite commit_tail. That's
the entire point for having created it. Exceptions are just the very, very
simple ones without anything fancy.

> If there's a way to add pre/post_commit() hooks (maybe they already
> exist?) at various levels of the display pipeline that get called even
> when the associated state has *not* changed, then I might be able to
> place my mask/unmask/clear operations there. I really need that to
> properly unmask the underrun interrupt, otherwise underruns might not be
> detected after the first modeset if nothing changed in sub-parts of the
> pipeline.

Just overwrite commit_tail. That's why it exists.

> > It's also midlayer mistake, since you're stuff your callbacks into the
> > drm_mode_config_funcs core structure (which should only be used for uapi
> > interfaces, not helper stuff).
> 
> Okay.
> 
> > 
> > What I had in mind is just a simple function which spews into dmesg (so
> > that i915 could use it) and increments the underrun counter in debugfs.
> > Everything else needs to be handled in drivers I think. E.g.
> > 
> > drm_mode_config_report_underrun(struct drm_device *dev,
> > 				const char *source);
> > 
> > and then rolling it out for i915 too (otherwise there's really no point in
> > the common infrastructure).
> 
> Modifying the i915 driver was a bit premature since I didn't have your
> feedback on the general approach. And given your review, I'm glad I
> didn't start this conversion :-P.
> 
> Anyway, I was not convinced having a generic infrastructure to report
> underrun errors was needed in the first place. The fact that these
> errors are very HW-specific, and that data attached to such events
> might be useful to understand what actually happened makes me think we
> don't really need this generic underrun reporting helper.
> 
> Also note that IGT tests are likely to be HW specific too, because the
> workload needed to trigger underrun errors is likely to depend on the
> platform you're running on. And if you already have to write your own
> tests, I don't see clear benefit in exposing underrun errors
> generically.

See my other reply in the earlier thread: That's why we're just dumping
errors into dmesg, and catching any such thing in our runners. We do not
have any specific underrun tests, we treat the entire igt test suite of
kms tests as our underrun test suite.

Extending that with more nasty corner cases is obviously great, and will
benefit everyone. But I don't think you want an igt testcase that looks
specifically for underruns. We do use crc to hunt for specific issues, but
again that's portable between drivers.

> Anyway, this is just my opinion, and if you think we actually need the
> drm_mode_config_report_underrun() function, I'll implement it.

I do think having some standard in how these errors are reported would be
useful, even if it's just a special DRM_UNDERRUN_ERROR() macro.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel