On Fri, Oct 26, 2018 at 3:57 PM Boris Brezillon <boris.brezillon@xxxxxxxxxxx> wrote: > > On Fri, 26 Oct 2018 15:30:26 +0200 > Daniel Vetter <daniel@xxxxxxxx> wrote: > > > On Thu, Oct 25, 2018 at 11:41:14AM +0200, Boris Brezillon wrote: > > > On Thu, 25 Oct 2018 11:33:14 +0200 > > > Daniel Vetter <daniel@xxxxxxxx> wrote: > > > > > > > On Thu, Oct 25, 2018 at 10:09:31AM +0200, Boris Brezillon wrote: > > > > > On Tue, 23 Oct 2018 15:44:43 +0200 > > > > > Daniel Vetter <daniel@xxxxxxxx> wrote: > > > > > > > > > > > On Tue, Oct 23, 2018 at 09:55:08AM +0200, Boris Brezillon wrote: > > > > > > > Hi Daniel, > > > > > > > > > > > > > > On Tue, 16 Oct 2018 14:57:43 +0200 > > > > > > > Daniel Vetter <daniel@xxxxxxxx> wrote: > > > > > > > > > > > > > > > On Tue, Oct 16, 2018 at 11:40:45AM +0200, Boris Brezillon wrote: > > > > > > > > > The HVS block is supposed to fill the pixelvalve FIFOs fast enough to > > > > > > > > > meet the requested framerate. The problem is, the HVS and memory bus > > > > > > > > > bandwidths are limited, and if we don't take these limitations into > > > > > > > > > account we might end up with HVS underflow errors. > > > > > > > > > > > > > > > > > > This patch is trying to model the per-plane HVS and memory bus bandwidth > > > > > > > > > consumption and take a decision at atomic_check() time whether the > > > > > > > > > estimated load will fit in the HVS and membus budget. > > > > > > > > > > > > > > > > > > Note that we take an extra margin on the memory bus consumption to let > > > > > > > > > the system run smoothly when other blocks are doing heavy use of the > > > > > > > > > memory bus. Same goes for the HVS limit, except the margin is smaller in > > > > > > > > > this case, since the HVS is not used by external components. > > > > > > > > > > > > > > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@xxxxxxxxxxx> > > > > > > > > > --- > > > > > > > > > This logic has been validated using a simple shell script and > > > > > > > > > some instrumentation in the VC4 driver: > > > > > > > > > > > > > > > > > > - capture underflow errors at the HVS level and expose a debugfs file > > > > > > > > > reporting those errors > > > > > > > > > - add debugfs files to expose when atomic_check fails because of the > > > > > > > > > HVS or membus load limitation or when it fails for other reasons > > > > > > > > > > > > > > > > > > The branch containing those modification is available here [1], and the > > > > > > > > > script (which is internally using modetest) is here [2] (please note > > > > > > > > > that I'm bad at writing shell scripts :-)). > > > > > > > > > > > > > > > > > > Note that those modification tend to over-estimate the load, and thus > > > > > > > > > reject setups that might have previously worked, so we might want to > > > > > > > > > adjust the limits to avoid that. > > > > > > > > > > > > > > > > > > [1]https://github.com/bbrezillon/linux/tree/vc4/hvs-bandwidth-eval > > > > > > > > > [2]https://github.com/bbrezillon/vc4-hvs-bandwidth-test > > > > > > > > > > > > > > > > Any interest in using igt to test this stuff? We have at least a bunch of > > > > > > > > tests already in there that try all kinds of plane setups. And we use > > > > > > > > those to hunt for underruns on i915 hw. > > > > > > > > > > > > > > > > Wrt underrun reporting: On i915 we just dump them into dmesg at the error > > > > > > > > level, using DRM_ERROR, > > > > > > > > > > > > > > Are you masking the underrun interrupt after it's been reported? If we > > > > > > > don't do that on VC4 we just end up flooding the kernel-log buffer until > > > > > > > someone comes and update the config. > > > > > > > > > > > > Yeah we do that too. Rule is that a full modeset will clear any underrun > > > > > > masking (so tests need to make sure they start with a modeset, or it'll be > > > > > > for nothing). > > > > > > > > > > > > > > > > > > > > > plus a tracepoint. See e.g. > > > > > > > > intel_pch_fifo_underrun_irq_handler(). If there's interest we could > > > > > > > > perhaps extract this into something common, similar to what was done with > > > > > > > > crc support already. > > > > > > > > > > > > > > > > > > > > > > I'm not a big fan of hardcoded trace points in general (because of the > > > > > > > whole "it's part of the stable ABI" thing), and in this case, making the > > > > > > > tracepoint generic sounds even more risky to me. Indeed, how can we know > > > > > > > about all the HW specific bits one might want to expose. For instance, > > > > > > > I see the intel underrun tracepoint exposes a struct with a frame and > > > > > > > scanline field, and AFAICT, we don't have such information in the VC4 > > > > > > > case. > > > > > > > > > > > > > > Any opinion on that? > > > > > > > > > > > > It's only abi if you're unlucky. If it's just for debugging and > > > > > > validation, you can change it again. Tbh, no idea why we even have these > > > > > > tracepoints, they're fairly useless imo. CI only relies upon the dmesg > > > > > > output. Maybe run git blame and ask the original author, we can probably > > > > > > update them to suit our needs. > > > > > > > > > > Okay, I think I'll go for a generic debugfs entry that returns true > > > > > when an underrun error happened since the last modeset, false otherwise. > > > > > > > > > > Next question is: should I attach the underrun status to the drm_device > > > > > or have one per CRTC? In my case, I only care about the "has an > > > > > underrun error occurred on any of the active CRTC" case, so I'd vote for > > > > > a per-device underrun status. > > > > > > > > Yeah probably good enough. For our CI all we care about is the warn/error > > > > level dmesg output. Anything at that level is considered a CI failure. > > > > > > So igt is grepping dmesg to detect when an underrun happens? > > > > No, but the CI runner is also observing dmesg. Anything in there at > > warning or higher level is considered a failure. > > Eric, do you do the same when you launch the IGT testsuite? > > > > > > > What do you need the debugfs file for? > > > > > > I just thought having a debugfs file to expose the underrun information > > > would be cleaner than having to grep in dmesg to detect such failures. > > > > The issue is that you want to detect underruns everywhere, not just in the > > specific tests you're checking for it. Anything that does a modeset could > > cause an underrun (at least we've managed to do so pretty much everywhere > > on i915 hw, if you misprogram is sufficiently). > > In my specific case, I want to have the IGT test check the underrun > value while the test is being executed so that I know which exact > configuration triggers the underrun situation. At least that's how I > did to adjust/debug the HVS load tracking code. Maybe it's not really a > problem when all we do is tracking regressions. Ok, that makes sense, and explains why you want the overall underrun counter exposed in debugfs. Our experience with underruns has been that it's very dynamic (on i915 hw), and most often it's not a given configuration, but a given sequence of changes while some other workload is going on ... -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel