Re: [PATCH] drm/amdgpu: Enable tunneling on high-priority compute queues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 8, 2023 at 1:37 PM Alex Deucher <alexdeucher@xxxxxxxxx> wrote:
On Fri, Dec 8, 2023 at 12:27 PM Joshua Ashton <joshua@xxxxxxxxx> wrote:
>
> FWIW, we are shipping this right now in SteamOS Preview channel
> (probably going to Stable soon) and it seems to be working as expected
> and fixing issues there in instances we need to composite, compositor
> work we are forced to do would take longer than the compositor redzone
> to vblank.
>
> Previously in high gfx workloads like Cyberpunk using 100% of the GPU,
> we would consistently miss the deadline as composition could take
> anywhere from 2-6ms fairly randomly.
>
> Now it seems the time for the compositor's work to complete is pretty
> consistent and well in-time in gpuvis for every frame.

I was mostly just trying to look up the information to verify that it
was set up correctly, but I guess Marek already did and provided you
with that info, so it's probably fine as is.

>
> The only times we are not meeting deadline now is when there is an
> application using very little GPU and finishes incredibly quick, and the
> compositor is doing significantly more work (eg. FSR from 800p -> 4K or
> whatever), but that's a separate problem that can likely be solved by
> inlining some of the composition work with the client's dmabuf work if
> it has focus to avoid those clock bubbles.
>
> I heard some musings about dmabuf deadline kernel work recently, but not
> sure if any of that is applicable to AMD.

I think something like a workload hint would be more useful.  We did a
few patch sets to allow userspace to provide a hint to the kernel
about the workload type so the kernel could adjust the power
management heuristics accordingly, but there were concerns that the
UMDs would have to maintain application lists to select which
heuristic worked best for each application.  Maybe it would be better
to provide a general classification?  E.g., if the GL or vulkan app
uses these extensions, it's probably a compute type application vs
something more graphics-y.  The usual trade-off between power and
performance.  In general, just letting the firmware pick the clock
based on perf counters generally seems to work the best.  Maybe a
general workload hint set by the compositor based on the content type
it's displaying would be a better option (video vs gaming vs desktop)?

The deadline stuff doesn't really align well with what we can do with
our firmware and seems ripe for abuse.  Apps can just ask for high
clocks all the time which is great for performance, but not great for
power.  Plus there is not much room for anything other than max clocks
since you don't know how big the workload is or which clocks are the
limiting factor.

Max clocks also decrease performance due to thermal and power limits. You'll get more performance and less heat if you let the GPU turn off idle blocks and boost clocks for busy blocks.

Marek


[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux