Re: [RFC PATCH] drm/vc4: Add a load tracker to prevent HVS underflow errors

Boris Brezillon <boris.brezillon@xxxxxxxxxxx> · Thu, 25 Oct 2018 10:09:31 +0200

On Tue, 23 Oct 2018 15:44:43 +0200
Daniel Vetter <daniel@xxxxxxxx> wrote:

> On Tue, Oct 23, 2018 at 09:55:08AM +0200, Boris Brezillon wrote:
> > Hi Daniel,
> > 
> > On Tue, 16 Oct 2018 14:57:43 +0200
> > Daniel Vetter <daniel@xxxxxxxx> wrote:
> >   
> > > On Tue, Oct 16, 2018 at 11:40:45AM +0200, Boris Brezillon wrote:  
> > > > The HVS block is supposed to fill the pixelvalve FIFOs fast enough to
> > > > meet the requested framerate. The problem is, the HVS and memory bus
> > > > bandwidths are limited, and if we don't take these limitations into
> > > > account we might end up with HVS underflow errors.
> > > > 
> > > > This patch is trying to model the per-plane HVS and memory bus bandwidth
> > > > consumption and take a decision at atomic_check() time whether the
> > > > estimated load will fit in the HVS and membus budget.
> > > > 
> > > > Note that we take an extra margin on the memory bus consumption to let
> > > > the system run smoothly when other blocks are doing heavy use of the
> > > > memory bus. Same goes for the HVS limit, except the margin is smaller in
> > > > this case, since the HVS is not used by external components.
> > > > 
> > > > Signed-off-by: Boris Brezillon <boris.brezillon@xxxxxxxxxxx>
> > > > ---
> > > > This logic has been validated using a simple shell script and
> > > > some instrumentation in the VC4 driver:
> > > > 
> > > > - capture underflow errors at the HVS level and expose a debugfs file
> > > >   reporting those errors
> > > > - add debugfs files to expose when atomic_check fails because of the
> > > >   HVS or membus load limitation or when it fails for other reasons
> > > > 
> > > > The branch containing those modification is available here [1], and the
> > > > script (which is internally using modetest) is here [2] (please note
> > > > that I'm bad at writing shell scripts :-)).
> > > > 
> > > > Note that those modification tend to over-estimate the load, and thus
> > > > reject setups that might have previously worked, so we might want to
> > > > adjust the limits to avoid that.
> > > > 
> > > > [1]https://github.com/bbrezillon/linux/tree/vc4/hvs-bandwidth-eval
> > > > [2]https://github.com/bbrezillon/vc4-hvs-bandwidth-test    
> > > 
> > > Any interest in using igt to test this stuff? We have at least a bunch of
> > > tests already in there that try all kinds of plane setups. And we use
> > > those to hunt for underruns on i915 hw.
> > > 
> > > Wrt underrun reporting: On i915 we just dump them into dmesg at the error
> > > level, using DRM_ERROR,  
> > 
> > Are you masking the underrun interrupt after it's been reported? If we
> > don't do that on VC4 we just end up flooding the kernel-log buffer until
> > someone comes and update the config.  
> 
> Yeah we do that too. Rule is that a full modeset will clear any underrun
> masking (so tests need to make sure they start with a modeset, or it'll be
> for nothing).
> 
> >   
> > > plus a tracepoint. See e.g.
> > > intel_pch_fifo_underrun_irq_handler(). If there's interest we could
> > > perhaps extract this into something common, similar to what was done with
> > > crc support already.
> > >   
> > 
> > I'm not a big fan of hardcoded trace points in general (because of the
> > whole "it's part of the stable ABI" thing), and in this case, making the
> > tracepoint generic sounds even more risky to me. Indeed, how can we know
> > about all the HW specific bits one might want to expose. For instance,
> > I see the intel underrun tracepoint exposes a struct with a frame and
> > scanline field, and AFAICT, we don't have such information in the VC4
> > case.
> > 
> > Any opinion on that?  
> 
> It's only abi if you're unlucky. If it's just for debugging and
> validation, you can change it again. Tbh, no idea why we even have these
> tracepoints, they're fairly useless imo. CI only relies upon the dmesg
> output. Maybe run git blame and ask the original author, we can probably
> update them to suit our needs.

Okay, I think I'll go for a generic debugfs entry that returns true
when an underrun error happened since the last modeset, false otherwise.

Next question is: should I attach the underrun status to the drm_device
or have one per CRTC? In my case, I only care about the "has an
underrun error occurred on any of the active CRTC" case, so I'd vote for
a per-device underrun status.

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel