On Tue, Jun 19, 2018 at 02:28:46PM +0200, Lucas Stach wrote: > Am Dienstag, den 19.06.2018, 12:42 +0100 schrieb Russell King - ARM Linux: > > On Tue, Jun 19, 2018 at 01:11:29PM +0200, Lucas Stach wrote: > > > Am Dienstag, den 19.06.2018, 12:00 +0100 schrieb Russell King - ARM Linux: > > > > No, it's not "a really big job" - it's just that the Dove GC600 is not > > > > fast enough to complete _two_ 1080p sized GPU operations within 500ms. > > > > The preceeding job contained two blits - one of them a non-alphablend > > > > copy of: > > > > > > > > 00180000 04200780 0,24,1920,1056 -> 0,24,1920,1056 > > > > > > > > and one an alpha blended copy of: > > > > > > > > 00000000 04380780 0,0,1920,1080 -> 0,0,1920,1080 > > > > > > > > This is (iirc) something I already fixed with the addition of the > > > > progress detection back before etnaviv was merged into the mainline > > > > kernel. > > > > > > I hadn't expected it to be this slow. I see that we might need to bring > > > back the progress detection to fix the userspace regression, but I'm > > > not fond of this, as it might lead to really bad QoS. > > > > Well, the choices are that or worse overall performance through having > > to ignore the GPU entirely. > > > > > I would prefer userspace tracking the size of the blits and flushing > > > the cmdstream at an appropriate time, so we don't end up with really > > > long running jobs, but I'm not sure if this would be acceptable to > > > you... > > > > The question becomes how to split up two operations. Yes, we could > > submit them individually, but if they're together taking in excess of > > 500ms, then it's likely that individually, each operation will take in > > excess of 250ms which is still a long time. > > > > In any case, I think we need to fix this for 4.17-stable and then try > > to work (a) which operations are taking a long time, and (b) how to > > solve this issue. > > Agreed. I'll look into bringing back the process detection for 4.17 > stable. > > I'm still curious why the GC600 on the Dove is that slow. With > performance like this moving a big(ish) window on the screen must be a > horrible user experience. I _think_ it's down to the blend being slow on GC600 - one of the problems of running modern "desktops" on the Dove is that with Xorg and a compositing window manager (eg, modern metacity) then yes, dragging windows around is very slow because of the multiple GPU operations required - even dragging a small window results in almost the entire screen being re-blended. I don't think that's fair to blame on the Dove though - that's just total inefficiency on the Xorg/compositing side to basically redraw the _entire_ screen for small changes. The compositing window manager brings with it other issues as well, in particular with colour-keyed overlay and detecting whether anything obscures the overlay. For example, if, as a memory bandwidth optimisation, you detect that the overlay window is unobscured in the Xvideo extension, and disable the primary plane and colourkeying, this works fine with non-compositing managers. However, with a compositing manager, you can end up with there being some graphics that is blended _on top_ of the Xvideo window which is unknown to the Xvideo backend... which results in the graphics not being displayed. The blending also has a detrimental effect on the colourkeying when the graphics is displayed - because of the blending, the colourkey is no longer the expected RGB value around objects, so you get the colourkey bleeding through around (eg) a menu. I've now disabled compositing in metacity which makes things a whole lot nicer (I've actually been meaning to track down the "slow window dragging" problem for a good few months now) and solves the overlay issue too. > > Do we have any way to track how long each submitted job has actually > > taken on the GPU? (Eg, by recording the times that we receive the > > events?) It wouldn't be very accurate for small jobs, but given this > > operation is taking so long, it would give an indication of how long > > this operation is actually taking. etnaviv doesn't appear to have > > any tracepoints, which would've been ideal for that. Maybe this is > > a reason to add some? ;) > > See attached patch (which I apparently forgot to send out). The DRM GPU > scheduler has some tracepoints, which might be helpful. The attached > patch adds a drm_sched_job_run tracepoint when a job is queued in the > hardware ring. Together with the existing drm_sched_process_job, this > should get you an idea how long a job takes to process. Note that at > any time up to 4 jobs are allowed in the hardware queue, so you need to > match up the end times. Thanks, I'll try to get some data in the next week or so. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel