On Fri, Dec 8, 2023 at 12:27 PM Joshua Ashton <joshua@xxxxxxxxx> wrote: > > FWIW, we are shipping this right now in SteamOS Preview channel > (probably going to Stable soon) and it seems to be working as expected > and fixing issues there in instances we need to composite, compositor > work we are forced to do would take longer than the compositor redzone > to vblank. > > Previously in high gfx workloads like Cyberpunk using 100% of the GPU, > we would consistently miss the deadline as composition could take > anywhere from 2-6ms fairly randomly. > > Now it seems the time for the compositor's work to complete is pretty > consistent and well in-time in gpuvis for every frame. I was mostly just trying to look up the information to verify that it was set up correctly, but I guess Marek already did and provided you with that info, so it's probably fine as is. > > The only times we are not meeting deadline now is when there is an > application using very little GPU and finishes incredibly quick, and the > compositor is doing significantly more work (eg. FSR from 800p -> 4K or > whatever), but that's a separate problem that can likely be solved by > inlining some of the composition work with the client's dmabuf work if > it has focus to avoid those clock bubbles. > > I heard some musings about dmabuf deadline kernel work recently, but not > sure if any of that is applicable to AMD. I think something like a workload hint would be more useful. We did a few patch sets to allow userspace to provide a hint to the kernel about the workload type so the kernel could adjust the power management heuristics accordingly, but there were concerns that the UMDs would have to maintain application lists to select which heuristic worked best for each application. Maybe it would be better to provide a general classification? E.g., if the GL or vulkan app uses these extensions, it's probably a compute type application vs something more graphics-y. The usual trade-off between power and performance. In general, just letting the firmware pick the clock based on perf counters generally seems to work the best. Maybe a general workload hint set by the compositor based on the content type it's displaying would be a better option (video vs gaming vs desktop)? The deadline stuff doesn't really align well with what we can do with our firmware and seems ripe for abuse. Apps can just ask for high clocks all the time which is great for performance, but not great for power. Plus there is not much room for anything other than max clocks since you don't know how big the workload is or which clocks are the limiting factor. Alex > > - Joshie 🐸✨ > > On 12/8/23 15:33, Marek Olšák wrote: > > On Fri, Dec 8, 2023 at 9:57 AM Christian König <christian.koenig@xxxxxxx > > <mailto:christian.koenig@xxxxxxx>> wrote: > > > > Am 08.12.23 um 12:43 schrieb Friedrich Vock: > > > On 08.12.23 10:51, Christian König wrote: > > >> Well longer story short Alex and I have been digging up the > > >> documentation for this and as far as we can tell this isn't correct. > > > Huh. I initially talked to Marek about this, adding him in Cc. > > > > Yeah, from the userspace side all you need to do is to set the bit as > > far as I can tell. > > > > >> > > >> You need to do quite a bit more before you can turn on this feature. > > >> What userspace side do you refer to? > > > I was referring to the Mesa merge request I made > > > (https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26462 > > <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26462>). > > > If/When you have more details about what else needs to be done, feel > > > free to let me know. > > > > For example from the hardware specification explicitly states that the > > kernel driver should make sure that only one app/queue is using this at > > the same time. That might work for now since we should only have a > > single compute priority queue, but we are not 100% sure yet. > > > > > > This is incorrect. While the hw documentation says it's considered > > "unexpected programming", it also says that the hardware algorithm > > handles it correctly and it describes what happens in this case: > > Tunneled waves from different queues are treated as equal. > > > > Marek >