Re: [PATCH v2 1/3] drm/atomic: Add a generic infrastructure to track underrun errors

Daniel Vetter <daniel@xxxxxxxx> · Fri, 26 Oct 2018 16:23:58 +0200

On Fri, Oct 26, 2018 at 4:13 PM Boris Brezillon
<boris.brezillon@xxxxxxxxxxx> wrote:
>
> On Fri, 26 Oct 2018 15:36:15 +0200
> Daniel Vetter <daniel@xxxxxxxx> wrote:
>
> > On Fri, Oct 26, 2018 at 02:36:56PM +0200, Boris Brezillon wrote:
> > > Hi Daniel,
> > >
> > > On Fri, 26 Oct 2018 12:33:37 +0200
> > > Daniel Vetter <daniel@xxxxxxxx> wrote:
> > >
> > > > On Thu, Oct 25, 2018 at 02:45:44PM +0200, Boris Brezillon wrote:
> > > > > Display controllers usually provide a lot features like overlay planes,
> > > > > hardware scalers, hardware rotations, ...
> > > > > Most of the time those features work just fine when enabled
> > > > > independently, but activating all of them at the same time tend to
> > > > > show other limitations like the limited memory bus or scaler
> > > > > bandwidth.
> > > > >
> > > > > This sort of bandwidth estimation is hard to get right, and we
> > > > > sometimes fail to predict that a specific workload will not pass,
> > > > > which will most likely result in underrun errors somewhere in the
> > > > > display pipeline (most of the time at the CRTC level).
> > > > >
> > > > > This path aims at making underrun error tracking generic and exposing
> > > > > underrun errors to userspace through a debugfs file.
> > > > >
> > > > > This way, CI tools like IGT, can try to enable a bunch of features and
> > > > > if such underrun errors are reported after the settings have been
> > > > > accepted and applied by the KMS infrastructure. When that happens it
> > > > > means the load tracking algorithm needs some tweaking/fixing.
> > > > >
> > > > > Signed-off-by: Boris Brezillon <boris.brezillon@xxxxxxxxxxx>
> > > > > ---
> > > > > Changes in v2:
> > > > > - New patch
> > > >
> > > > Hm, not what I thought off. This is definitely not going to be reusable
> > > > for i915, the only other driver with underrun reporting, since we have
> > > > tons of underrun sources (not just one). And we need to mask/unmask them
> > > > individually.
> > >
> > > I agree it's not perfect, but I did it this way so that hopefully we'll
> > > be able to use drm_atomic_helper_commit_tail() at some point instead of
> > > having our own implementation.
> >
> > I expect every non-trivial driver to need to overwrite commit_tail. That's
> > the entire point for having created it. Exceptions are just the very, very
> > simple ones without anything fancy.
>
> Again, just my opinion, but doing that means some drivers are left
> behind when new stuff are added to the generic helpers, and sometimes
> those new features might be of interest to the drivers that are not
> directly using those generic helpers, except they usually don't even
> know about it until one comes at looking at the generic helper again
> and notice the additions.
> Maybe I'm not following the dri-devel ML close enough, and I'm probably
> the one to blame here, but I see a real benefit in having the maximum
> number of drivers using the generic helpers. To be honest, with a few
> modifications, I think the VC4 driver could entirely rely on those
> generic helpers.

commit_tail commits the sw state to hw. Your hw will not magically
gain any new features, so there's really not much point in
automatically adding stuff to the overall hw commit flow. Some
exceptoins apply, and those get reviewed when the core change goes in.

Yes I spend ridiculous amounts of my time reading everyone else's
drivers. And I really don't want to make commit_tail 100% applicapable
to everyone, that just makes reviewing drivers harder (since the flow
isn't obvious by simply looking at commit_tail, instead you have to
hunt down feature flags and other stuff that control things).

This holds in general btw, and it's part of the no-midlayer design
paradigm. No huge blobs that work for everyone, instead make things
modular.

But if you have a concrete example where we added something, and a
driver should have used it but didn't, we'll need to look into that.
-Daniel

> > > If there's a way to add pre/post_commit() hooks (maybe they already
> > > exist?) at various levels of the display pipeline that get called even
> > > when the associated state has *not* changed, then I might be able to
> > > place my mask/unmask/clear operations there. I really need that to
> > > properly unmask the underrun interrupt, otherwise underruns might not be
> > > detected after the first modeset if nothing changed in sub-parts of the
> > > pipeline.
> >
> > Just overwrite commit_tail. That's why it exists.
>
> And that's what we do already, but I was hoping that someday we would
> be able to switch to the generic helper...
>
> >
> > > > It's also midlayer mistake, since you're stuff your callbacks into the
> > > > drm_mode_config_funcs core structure (which should only be used for uapi
> > > > interfaces, not helper stuff).
> > >
> > > Okay.
> > >
> > > >
> > > > What I had in mind is just a simple function which spews into dmesg (so
> > > > that i915 could use it) and increments the underrun counter in debugfs.
> > > > Everything else needs to be handled in drivers I think. E.g.
> > > >
> > > > drm_mode_config_report_underrun(struct drm_device *dev,
> > > >                           const char *source);
> > > >
> > > > and then rolling it out for i915 too (otherwise there's really no point in
> > > > the common infrastructure).
> > >
> > > Modifying the i915 driver was a bit premature since I didn't have your
> > > feedback on the general approach. And given your review, I'm glad I
> > > didn't start this conversion :-P.
> > >
> > > Anyway, I was not convinced having a generic infrastructure to report
> > > underrun errors was needed in the first place. The fact that these
> > > errors are very HW-specific, and that data attached to such events
> > > might be useful to understand what actually happened makes me think we
> > > don't really need this generic underrun reporting helper.
> > >
> > > Also note that IGT tests are likely to be HW specific too, because the
> > > workload needed to trigger underrun errors is likely to depend on the
> > > platform you're running on. And if you already have to write your own
> > > tests, I don't see clear benefit in exposing underrun errors
> > > generically.
> >
> > See my other reply in the earlier thread: That's why we're just dumping
> > errors into dmesg, and catching any such thing in our runners. We do not
> > have any specific underrun tests, we treat the entire igt test suite of
> > kms tests as our underrun test suite.
> >
> > Extending that with more nasty corner cases is obviously great, and will
> > benefit everyone. But I don't think you want an igt testcase that looks
> > specifically for underruns.
>
> Actually, that's exactly what I wanted to do: use config that were
> known to generate underrun errors and make sure they don't after adding
> the HVS load tracker. That's also particularly useful to track
> regressions in this portion of the code.
>
> > We do use crc to hunt for specific issues, but
> > again that's portable between drivers.
>
> We don't use CRC yet, but there was plans to add tests using writeback
> to compare the output image to what we expect, and Maxime also worked
> on tests relying on the Chamelium board.
>
> Still, knowing why the image is incorrect is useful, and I guess you
> could have a wrong output that is not necessarily caused by an
> underrun error. So I think having tests that explicly check for
> underrun errors is a good thing.
>
> >
> > > Anyway, this is just my opinion, and if you think we actually need the
> > > drm_mode_config_report_underrun() function, I'll implement it.
> >
> > I do think having some standard in how these errors are reported would be
> > useful, even if it's just a special DRM_UNDERRUN_ERROR() macro.
>
> I'll define such a macro.

Function, if it does anything more than call DRM_ERROR. Just to avoid confusion.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel