Re: [PATCH] drm/amdgpu: Enable tunneling on high-priority compute queues

Marek Olšák <maraeo@xxxxxxxxx> · Mon, 11 Dec 2023 20:52:11 -0500

On Fri, Dec 8, 2023 at 1:37 PM Alex Deucher <alexdeucher@xxxxxxxxx> wrote:
On Fri, Dec 8, 2023 at 12:27 PM Joshua Ashton <joshua@xxxxxxxxx> wrote:

>

> FWIW, we are shipping this right now in SteamOS Preview channel

> (probably going to Stable soon) and it seems to be working as expected

> and fixing issues there in instances we need to composite, compositor

> work we are forced to do would take longer than the compositor redzone

> to vblank.

>

> Previously in high gfx workloads like Cyberpunk using 100% of the GPU,

> we would consistently miss the deadline as composition could take

> anywhere from 2-6ms fairly randomly.

>

> Now it seems the time for the compositor's work to complete is pretty

> consistent and well in-time in gpuvis for every frame.

I was mostly just trying to look up the information to verify that it

was set up correctly, but I guess Marek already did and provided you

with that info, so it's probably fine as is.

>

> The only times we are not meeting deadline now is when there is an

> application using very little GPU and finishes incredibly quick, and the

> compositor is doing significantly more work (eg. FSR from 800p -> 4K or

> whatever), but that's a separate problem that can likely be solved by

> inlining some of the composition work with the client's dmabuf work if

> it has focus to avoid those clock bubbles.

>

> I heard some musings about dmabuf deadline kernel work recently, but not

> sure if any of that is applicable to AMD.

I think something like a workload hint would be more useful.  We did a

few patch sets to allow userspace to provide a hint to the kernel

about the workload type so the kernel could adjust the power

management heuristics accordingly, but there were concerns that the

UMDs would have to maintain application lists to select which

heuristic worked best for each application.  Maybe it would be better

to provide a general classification?  E.g., if the GL or vulkan app

uses these extensions, it's probably a compute type application vs

something more graphics-y.  The usual trade-off between power and

performance.  In general, just letting the firmware pick the clock

based on perf counters generally seems to work the best.  Maybe a

general workload hint set by the compositor based on the content type

it's displaying would be a better option (video vs gaming vs desktop)?

The deadline stuff doesn't really align well with what we can do with

our firmware and seems ripe for abuse.  Apps can just ask for high

clocks all the time which is great for performance, but not great for

power.  Plus there is not much room for anything other than max clocks

since you don't know how big the workload is or which clocks are the

limiting factor.

Max clocks also decrease performance due to thermal and power limits. You'll get more performance and less heat if you let the GPU turn off idle blocks and boost clocks for busy blocks.

Marek