FWIW, we are shipping this right now in SteamOS Preview channel
(probably going to Stable soon) and it seems to be working as expected
and fixing issues there in instances we need to composite, compositor
work we are forced to do would take longer than the compositor redzone
to vblank.
Previously in high gfx workloads like Cyberpunk using 100% of the GPU,
we would consistently miss the deadline as composition could take
anywhere from 2-6ms fairly randomly.
Now it seems the time for the compositor's work to complete is pretty
consistent and well in-time in gpuvis for every frame.
The only times we are not meeting deadline now is when there is an
application using very little GPU and finishes incredibly quick, and the
compositor is doing significantly more work (eg. FSR from 800p -> 4K or
whatever), but that's a separate problem that can likely be solved by
inlining some of the composition work with the client's dmabuf work if
it has focus to avoid those clock bubbles.
I heard some musings about dmabuf deadline kernel work recently, but not
sure if any of that is applicable to AMD.
- Joshie 🐸✨
On 12/8/23 15:33, Marek Olšák wrote:
On Fri, Dec 8, 2023 at 9:57 AM Christian König <christian.koenig@xxxxxxx
<mailto:christian.koenig@xxxxxxx>> wrote:
Am 08.12.23 um 12:43 schrieb Friedrich Vock:
> On 08.12.23 10:51, Christian König wrote:
>> Well longer story short Alex and I have been digging up the
>> documentation for this and as far as we can tell this isn't correct.
> Huh. I initially talked to Marek about this, adding him in Cc.
Yeah, from the userspace side all you need to do is to set the bit as
far as I can tell.
>>
>> You need to do quite a bit more before you can turn on this feature.
>> What userspace side do you refer to?
> I was referring to the Mesa merge request I made
> (https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26462
<https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26462>).
> If/When you have more details about what else needs to be done, feel
> free to let me know.
For example from the hardware specification explicitly states that the
kernel driver should make sure that only one app/queue is using this at
the same time. That might work for now since we should only have a
single compute priority queue, but we are not 100% sure yet.
This is incorrect. While the hw documentation says it's considered
"unexpected programming", it also says that the hardware algorithm
handles it correctly and it describes what happens in this case:
Tunneled waves from different queues are treated as equal.
Marek