On 20/02/2023 15:45, Rob Clark wrote:
On Mon, Feb 20, 2023 at 4:22 AM Tvrtko Ursulin
<tvrtko.ursulin@xxxxxxxxxxxxxxx> wrote:
On 17/02/2023 17:00, Rob Clark wrote:
On Fri, Feb 17, 2023 at 8:03 AM Tvrtko Ursulin
<tvrtko.ursulin@xxxxxxxxxxxxxxx> wrote:
[snip]
adapted from your patches.. I think the basic idea of deadlines
(which includes "I want it NOW" ;-)) isn't controversial, but the
original idea got caught up in some bikeshed (what about compositors
that wait on fences in userspace to decide which surfaces to update in
the next frame), plus me getting busy and generally not having a good
plan for how to leverage this from VM guests (which is becoming
increasingly important for CrOS). I think I can build on some ongoing
virtgpu fencing improvement work to solve the latter. But now that we
have a 2nd use-case for this, it makes sense to respin.
Sure, I was looking at the old version already. It is interesting. But
also IMO needs quite a bit more work to approach achieving what is
implied from the name of the feature. It would need proper deadline
based sched job picking, and even then drm sched is mostly just a
frontend. So once past runnable status and jobs handed over to backend,
without further driver work it probably wouldn't be very effective past
very lightly loaded systems.
Yes, but all of that is not part of dma_fence ;-)
:) Okay.
Having said that, do we need a step back to think about whether adding
deadline to dma-fences is not making them something too much different
to what they were? Going from purely synchronisation primitive more
towards scheduling paradigms. Just to brainstorm if there will not be
any unintended consequences. I should mention this in your RFC thread
actually.
Perhaps "deadline" isn't quite the right name, but I haven't thought
of anything better. It is really a hint to the fence signaller about
how soon it is interested in a result so the driver can factor that
into freq scaling decisions. Maybe "goal" or some other term would be
better?
Don't know, no strong opinion on the name at the moment. For me it was
more about the change of what type of side channel data is getting
attached to dma-fence and whether it changes what the primitive is for.
I guess that can factor into scheduling decisions as well.. but we
already have priority for that. My main interest is freq mgmt.
(Thankfully we don't have performance and efficiency cores to worry
about, like CPUs ;-))
A pretty common challenging usecase is still the single fullscreen
game, where scheduling isn't the problem, but landing at an
appropriate GPU freq absolutely is. (UI workloads are perhaps more
interesting from a scheduler standpoint, but they generally aren't
challenging from a load/freq standpoint.)
Challenging as in picking the right operating point? Might be latency
impacted (and so user perceived UI smoothness) due missing waitboost for
anything syncobj related. I don't know if anything to measure that
exists currently though. Assuming it is measurable then the question
would be is it perceivable.
Fwiw, the original motivation of the series was to implement something
akin to i915 pageflip boosting without having to abandon the atomic
helpers. (And, I guess it would also let i915 preserve that feature
if it switched to atomic helpers.. I'm unsure if there are still other
things blocking i915's migration.)
Question for display folks I guess.
Then if we fast forward to a world where schedulers perhaps become fully
deadline aware (we even had this for i915 few years back) then the
question will be does equating waits with immediate deadlines still
works. Maybe not too well because we wouldn't have the ability to
distinguish between the "someone is waiting" signal from the otherwise
propagated deadlines.
Is there any other way to handle a wait boost than expressing it as an
ASAP deadline?
A leading question or just a question? Nothing springs to my mind at the
moment.
Just a question. The immediate deadline is the only thing that makes
sense to me, but that could be because I'm looking at it from the
perspective of also trying to handle the case where missing vblank
reduces utilization and provides the wrong signal to gpufreq.. i915
already has a way to handle this internally, but it involves bypassing
the atomic helpers, which isn't a thing I want to encourage other
drivers to do. And completely doesn't work for situations where the
gpu and display are separate devices.
Right, there is yet another angle to discuss with Daniel here who AFAIR
was a bit against i915 priority inheritance going past a single device
instance. In which case DRI_PRIME=1 would lose the ability to boost
frame buffer dependency chains. Opens up the question of deadline
inheritance across different drivers too. Or perhaps Daniel would be
okay with this working if implemented at the dma-fence layer.
Regards,
Tvrtko