> If compute queue is occupied only by you, the efficiency > is equal with setting job queue to high priority I think. The only risk is the situation when graphics will take all needed CUs. But in any case it should be very good test. Andres/Pierre-Loup, Did you try to do it or it is a lot of work for you? BTW: If there is non-VR application which will use high-priority h/w queue then VR application will suffer. Any ideas how to solve it? Sincerely yours, Serguei Sagalovitch On 2016-12-19 12:50 AM, zhoucm1 wrote: > Do you encounter the priority issue for compute queue with current > driver? > > If compute queue is occupied only by you, the efficiency is equal with > setting job queue to high priority I think. > > Regards, > David Zhou > > On 2016å¹´12æ??19æ?¥ 13:29, Andres Rodriguez wrote: >> Yes, vulkan is available on all-open through the mesa radv UMD. >> >> I'm not sure if I'm asking for too much, but if we can coordinate a >> similar interface in radv and amdgpu-pro at the vulkan level that >> would be great. >> >> I'm not sure what that's going to be yet. >> >> - Andres >> >> On 12/19/2016 12:11 AM, zhoucm1 wrote: >>> >>> >>> On 2016å¹´12æ??19æ?¥ 11:33, Pierre-Loup A. Griffais wrote: >>>> We're currently working with the open stack; I assume that a >>>> mechanism could be exposed by both open and Pro Vulkan userspace >>>> drivers and that the amdgpu kernel interface improvements we would >>>> pursue following this discussion would let both drivers take >>>> advantage of the feature, correct? >>> Of course. >>> Does open stack have Vulkan support? >>> >>> Regards, >>> David Zhou >>>> >>>> On 12/18/2016 07:26 PM, zhoucm1 wrote: >>>>> By the way, are you using all-open driver or amdgpu-pro driver? >>>>> >>>>> +David Mao, who is working on our Vulkan driver. >>>>> >>>>> Regards, >>>>> David Zhou >>>>> >>>>> On 2016å¹´12æ??18æ?¥ 06:05, Pierre-Loup A. Griffais wrote: >>>>>> Hi Serguei, >>>>>> >>>>>> I'm also working on the bringing up our VR runtime on top of amgpu; >>>>>> see replies inline. >>>>>> >>>>>> On 12/16/2016 09:05 PM, Sagalovitch, Serguei wrote: >>>>>>> Andres, >>>>>>> >>>>>>>> For current VR workloads we have 3 separate processes running >>>>>>>> actually: >>>>>>> So we could have potential memory overcommit case or do you do >>>>>>> partitioning >>>>>>> on your own? I would think that there is need to avoid >>>>>>> overcomit in >>>>>>> VR case to >>>>>>> prevent any BO migration. >>>>>> >>>>>> You're entirely correct; currently the VR runtime is setting up >>>>>> prioritized CPU scheduling for its VR compositor, we're working on >>>>>> prioritized GPU scheduling and pre-emption (eg. this thread), and in >>>>>> the future it will make sense to do work in order to make sure that >>>>>> its memory allocations do not get evicted, to prevent any unwelcome >>>>>> additional latency in the event of needing to perform just-in-time >>>>>> reprojection. >>>>>> >>>>>>> BTW: Do you mean __real__ processes or threads? >>>>>>> Based on my understanding sharing BOs between different processes >>>>>>> could introduce additional synchronization constrains. btw: I am >>>>>>> not >>>>>>> sure >>>>>>> if we are able to share Vulkan sync. object cross-process boundary. >>>>>> >>>>>> They are different processes; it is important for the compositor >>>>>> that >>>>>> is responsible for quality-of-service features such as consistently >>>>>> presenting distorted frames with the right latency, reprojection, >>>>>> etc, >>>>>> to be separate from the main application. >>>>>> >>>>>> Currently we are using unreleased cross-process memory and semaphore >>>>>> extensions to fetch updated eye images from the client application, >>>>>> but the just-in-time reprojection discussed here does not actually >>>>>> have any direct interactions with cross-process resource sharing, >>>>>> since it's achieved by using whatever is the latest, most up-to-date >>>>>> eye images that have already been sent by the client application, >>>>>> which are already available to use without additional >>>>>> synchronization. >>>>>> >>>>>>> >>>>>>>> 3) System compositor (we are looking at approaches to remove >>>>>>>> this >>>>>>>> overhead) >>>>>>> Yes, IMHO the best is to run in "full screen mode". >>>>>> >>>>>> Yes, we are working on mechanisms to present directly to the headset >>>>>> display without any intermediaries as a separate effort. >>>>>> >>>>>>> >>>>>>>> The latency is our main concern, >>>>>>> I would assume that this is the known problem (at least for compute >>>>>>> usage). >>>>>>> It looks like that amdgpu / kernel submission is rather CPU >>>>>>> intensive >>>>>>> (at least >>>>>>> in the default configuration). >>>>>> >>>>>> As long as it's a consistent cost, it shouldn't an issue. >>>>>> However, if >>>>>> there's high degrees of variance then that would be troublesome >>>>>> and we >>>>>> would need to account for the worst case. >>>>>> >>>>>> Hopefully the requirements and approach we described make sense, >>>>>> we're >>>>>> looking forward to your feedback and suggestions. >>>>>> >>>>>> Thanks! >>>>>> - Pierre-Loup >>>>>> >>>>>>> >>>>>>> Sincerely yours, >>>>>>> Serguei Sagalovitch >>>>>>> >>>>>>> >>>>>>> From: Andres Rodriguez <andresr at valvesoftware.com> >>>>>>> Sent: December 16, 2016 10:00 PM >>>>>>> To: Sagalovitch, Serguei; amd-gfx at lists.freedesktop.org >>>>>>> Subject: RE: [RFC] Mechanism for high priority scheduling in amdgpu >>>>>>> >>>>>>> Hey Serguei, >>>>>>> >>>>>>>> [Serguei] No. I mean pipe :-) as MEC define it. As far as I >>>>>>>> understand (by simplifying) >>>>>>>> some scheduling is per pipe. I know about the current allocation >>>>>>>> scheme but I do not think >>>>>>>> that it is ideal. I would assume that we need to switch to >>>>>>>> dynamical partition >>>>>>>> of resources based on the workload otherwise we will have >>>>>>>> resource >>>>>>>> conflict >>>>>>>> between Vulkan compute and OpenCL. >>>>>>> >>>>>>> I agree the partitioning isn't ideal. I'm hoping we can start >>>>>>> with a >>>>>>> solution that assumes that >>>>>>> only pipe0 has any work and the other pipes are idle (no HSA/ROCm >>>>>>> running on the system). >>>>>>> >>>>>>> This should be more or less the use case we expect from VR users. >>>>>>> >>>>>>> I agree the split is currently not ideal, but I'd like to consider >>>>>>> that a separate task, because >>>>>>> making it dynamic is not straight forward :P >>>>>>> >>>>>>>> [Serguei] Vulkan works via amdgpu (kernel submissions) so amdkfd >>>>>>>> will be not >>>>>>>> involved. I would assume that in the case of VR we will have >>>>>>>> one main >>>>>>>> application ("console" mode(?)) so we could temporally "ignore" >>>>>>>> OpenCL/ROCm needs when VR is running. >>>>>>> >>>>>>> Correct, this is why we want to enable the high priority compute >>>>>>> queue through >>>>>>> libdrm-amdgpu, so that we can expose it through Vulkan later. >>>>>>> >>>>>>> For current VR workloads we have 3 separate processes running >>>>>>> actually: >>>>>>> 1) Game process >>>>>>> 2) VR Compositor (this is the process that will require high >>>>>>> priority queue) >>>>>>> 3) System compositor (we are looking at approaches to remove >>>>>>> this >>>>>>> overhead) >>>>>>> >>>>>>> For now I think it is okay to assume no OpenCL/ROCm running >>>>>>> simultaneously, but >>>>>>> I would also like to be able to address this case in the future >>>>>>> (cross-pipe priorities). >>>>>>> >>>>>>>> [Serguei] The problem with pre-emption of graphics task: (a) it >>>>>>>> may take time so >>>>>>>> latency may suffer >>>>>>> >>>>>>> The latency is our main concern, we want something that is >>>>>>> predictable. A good >>>>>>> illustration of what the reprojection scheduling looks like can be >>>>>>> found here: >>>>>>> https://community.amd.com/servlet/JiveServlet/showImage/38-1310-104754/pastedImage_3.png >>>>>>> >>>>>>> >>>>>>> >>>>>>>> (b) to preempt we need to have different "context" - we want >>>>>>>> to guarantee that submissions from the same context will be >>>>>>>> executed >>>>>>>> in order. >>>>>>> >>>>>>> This is okay, as the reprojection work doesn't have dependencies on >>>>>>> the game context, and it >>>>>>> even happens in a separate process. >>>>>>> >>>>>>>> BTW: (a) Do you want "preempt" and later resume or do you want >>>>>>>> "preempt" and >>>>>>>> "cancel/abort" >>>>>>> >>>>>>> Preempt the game with the compositor task and then resume it. >>>>>>> >>>>>>>> (b) Vulkan is generic API and could be used for graphics as >>>>>>>> well as >>>>>>>> for plain compute tasks (VK_QUEUE_COMPUTE_BIT). >>>>>>> >>>>>>> Yeah, the plan is to use vulkan compute. But if you figure out a >>>>>>> way >>>>>>> for us to get >>>>>>> a guaranteed execution time using vulkan graphics, then I'll >>>>>>> take you >>>>>>> out for a beer :) >>>>>>> >>>>>>> Regards, >>>>>>> Andres >>>>>>> ________________________________________ >>>>>>> From: Sagalovitch, Serguei [Serguei.Sagalovitch at amd.com] >>>>>>> Sent: Friday, December 16, 2016 9:13 PM >>>>>>> To: Andres Rodriguez; amd-gfx at lists.freedesktop.org >>>>>>> Subject: Re: [RFC] Mechanism for high priority scheduling in amdgpu >>>>>>> >>>>>>> Hi Andres, >>>>>>> >>>>>>> Please see inline (as [Serguei]) >>>>>>> >>>>>>> Sincerely yours, >>>>>>> Serguei Sagalovitch >>>>>>> >>>>>>> >>>>>>> From: Andres Rodriguez <andresr at valvesoftware.com> >>>>>>> Sent: December 16, 2016 8:29 PM >>>>>>> To: Sagalovitch, Serguei; amd-gfx at lists.freedesktop.org >>>>>>> Subject: RE: [RFC] Mechanism for high priority scheduling in amdgpu >>>>>>> >>>>>>> Hi Serguei, >>>>>>> >>>>>>> Thanks for the feedback. Answers inline as [AR]. >>>>>>> >>>>>>> Regards, >>>>>>> Andres >>>>>>> >>>>>>> ________________________________________ >>>>>>> From: Sagalovitch, Serguei [Serguei.Sagalovitch at amd.com] >>>>>>> Sent: Friday, December 16, 2016 8:15 PM >>>>>>> To: Andres Rodriguez; amd-gfx at lists.freedesktop.org >>>>>>> Subject: Re: [RFC] Mechanism for high priority scheduling in amdgpu >>>>>>> >>>>>>> Andres, >>>>>>> >>>>>>> >>>>>>> Quick comments: >>>>>>> >>>>>>> 1) To minimize "bubbles", etc. we need to "force" CU >>>>>>> assignments/binding >>>>>>> to high-priority queue when it will be in use and "free" them >>>>>>> later >>>>>>> (we do not want forever take CUs from e.g. graphic task to degrade >>>>>>> graphics >>>>>>> performance). >>>>>>> >>>>>>> Otherwise we could have scenario when long graphics task (or >>>>>>> low-priority >>>>>>> compute) will took all (extra) CUs and high--priority will wait for >>>>>>> needed resources. >>>>>>> It will not be visible on "NOP " but only when you submit "real" >>>>>>> compute task >>>>>>> so I would recommend not to use "NOP" packets at all for testing. >>>>>>> >>>>>>> It (CU assignment) could be relatively easy done when everything is >>>>>>> going via kernel >>>>>>> (e.g. as part of frame submission) but I must admit that I am >>>>>>> not sure >>>>>>> about the best way for user level submissions (amdkfd). >>>>>>> >>>>>>> [AR] I wasn't aware of this part of the programming sequence. >>>>>>> Thanks >>>>>>> for the heads up! >>>>>>> Is this similar to the CU masking programming? >>>>>>> [Serguei] Yes. To simplify: the problem is that "scheduler" when >>>>>>> deciding which >>>>>>> queue to run will check if there is enough resources and if not >>>>>>> then >>>>>>> it will begin >>>>>>> to check other queues with lower priority. >>>>>>> >>>>>>> 2) I would recommend to dedicate the whole pipe to high-priority >>>>>>> queue and have >>>>>>> nothing their except it. >>>>>>> >>>>>>> [AR] I'm guessing in this context you mean pipe = queue? (as >>>>>>> opposed >>>>>>> to the MEC definition >>>>>>> of pipe, which is a grouping of queues). I say this because amdgpu >>>>>>> only has access to 1 pipe, >>>>>>> and the rest are statically partitioned for amdkfd usage. >>>>>>> >>>>>>> [Serguei] No. I mean pipe :-) as MEC define it. As far as I >>>>>>> understand (by simplifying) >>>>>>> some scheduling is per pipe. I know about the current allocation >>>>>>> scheme but I do not think >>>>>>> that it is ideal. I would assume that we need to switch to >>>>>>> dynamical partition >>>>>>> of resources based on the workload otherwise we will have resource >>>>>>> conflict >>>>>>> between Vulkan compute and OpenCL. >>>>>>> >>>>>>> >>>>>>> BTW: Which user level API do you want to use for compute: Vulkan or >>>>>>> OpenCL? >>>>>>> >>>>>>> [AR] Vulkan >>>>>>> >>>>>>> [Serguei] Vulkan works via amdgpu (kernel submissions) so amdkfd >>>>>>> will >>>>>>> be not >>>>>>> involved. I would assume that in the case of VR we will have >>>>>>> one main >>>>>>> application ("console" mode(?)) so we could temporally "ignore" >>>>>>> OpenCL/ROCm needs when VR is running. >>>>>>> >>>>>>>> we will not be able to provide a solution compatible with GFX >>>>>>>> worloads. >>>>>>> I assume that you are talking about graphics? Am I right? >>>>>>> >>>>>>> [AR] Yeah, my understanding is that pre-empting the currently >>>>>>> running >>>>>>> graphics job and scheduling in >>>>>>> something else using mid-buffer pre-emption has some cases where it >>>>>>> doesn't work well. But if with >>>>>>> polaris10 it starts working well, it might be a better solution for >>>>>>> us (because the whole reprojection >>>>>>> work uses the vulkan graphics stack at the moment, and porting >>>>>>> it to >>>>>>> compute is not trivial). >>>>>>> >>>>>>> [Serguei] The problem with pre-emption of graphics task: (a) it >>>>>>> may >>>>>>> take time so >>>>>>> latency may suffer (b) to preempt we need to have different >>>>>>> "context" >>>>>>> - we want >>>>>>> to guarantee that submissions from the same context will be >>>>>>> executed >>>>>>> in order. >>>>>>> BTW: (a) Do you want "preempt" and later resume or do you want >>>>>>> "preempt" and >>>>>>> "cancel/abort"? (b) Vulkan is generic API and could be used >>>>>>> for graphics as well as for plain compute tasks >>>>>>> (VK_QUEUE_COMPUTE_BIT). >>>>>>> >>>>>>> >>>>>>> Sincerely yours, >>>>>>> Serguei Sagalovitch >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> on behalf of >>>>>>> Andres Rodriguez <andresr at valvesoftware.com> >>>>>>> Sent: December 16, 2016 6:15 PM >>>>>>> To: amd-gfx at lists.freedesktop.org >>>>>>> Subject: [RFC] Mechanism for high priority scheduling in amdgpu >>>>>>> >>>>>>> Hi Everyone, >>>>>>> >>>>>>> This RFC is also available as a gist here: >>>>>>> https://gist.github.com/lostgoat/7000432cd6864265dbc2c3ab93204249 >>>>>>> >>>>>>> >>>>>>> >>>>>>> [RFC] Mechanism for high priority scheduling in amdgpu >>>>>>> gist.github.com >>>>>>> [RFC] Mechanism for high priority scheduling in amdgpu >>>>>>> >>>>>>> >>>>>>> >>>>>>> [RFC] Mechanism for high priority scheduling in amdgpu >>>>>>> gist.github.com >>>>>>> [RFC] Mechanism for high priority scheduling in amdgpu >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> [RFC] Mechanism for high priority scheduling in amdgpu >>>>>>> gist.github.com >>>>>>> [RFC] Mechanism for high priority scheduling in amdgpu >>>>>>> >>>>>>> >>>>>>> We are interested in feedback for a mechanism to effectively >>>>>>> schedule >>>>>>> high >>>>>>> priority VR reprojection tasks (also referred to as >>>>>>> time-warping) for >>>>>>> Polaris10 >>>>>>> running on the amdgpu kernel driver. >>>>>>> >>>>>>> Brief context: >>>>>>> -------------- >>>>>>> >>>>>>> The main objective of reprojection is to avoid motion sickness >>>>>>> for VR >>>>>>> users in >>>>>>> scenarios where the game or application would fail to finish >>>>>>> rendering a new >>>>>>> frame in time for the next VBLANK. When this happens, the user's >>>>>>> head >>>>>>> movements >>>>>>> are not reflected on the Head Mounted Display (HMD) for the >>>>>>> duration >>>>>>> of an >>>>>>> extra frame. This extended mismatch between the inner ear and the >>>>>>> eyes may >>>>>>> cause the user to experience motion sickness. >>>>>>> >>>>>>> The VR compositor deals with this problem by fabricating a new >>>>>>> frame >>>>>>> using the >>>>>>> user's updated head position in combination with the previous >>>>>>> frames. >>>>>>> This >>>>>>> avoids a prolonged mismatch between the HMD output and the inner >>>>>>> ear. >>>>>>> >>>>>>> Because of the adverse effects on the user, we require high >>>>>>> confidence that the >>>>>>> reprojection task will complete before the VBLANK interval. Even if >>>>>>> the GFX pipe >>>>>>> is currently full of work from the game/application (which is most >>>>>>> likely the case). >>>>>>> >>>>>>> For more details and illustrations, please refer to the following >>>>>>> document: >>>>>>> https://community.amd.com/community/gaming/blog/2016/03/28/asynchronous-shaders-evolved >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Gaming: Asynchronous Shaders Evolved | Community >>>>>>> community.amd.com >>>>>>> One of the most exciting new developments in GPU technology over >>>>>>> the >>>>>>> past year has been the adoption of asynchronous shaders, which can >>>>>>> make more efficient use of ... >>>>>>> >>>>>>> >>>>>>> >>>>>>> Gaming: Asynchronous Shaders Evolved | Community >>>>>>> community.amd.com >>>>>>> One of the most exciting new developments in GPU technology over >>>>>>> the >>>>>>> past year has been the adoption of asynchronous shaders, which can >>>>>>> make more efficient use of ... >>>>>>> >>>>>>> >>>>>>> >>>>>>> Gaming: Asynchronous Shaders Evolved | Community >>>>>>> community.amd.com >>>>>>> One of the most exciting new developments in GPU technology over >>>>>>> the >>>>>>> past year has been the adoption of asynchronous shaders, which can >>>>>>> make more efficient use of ... >>>>>>> >>>>>>> >>>>>>> Requirements: >>>>>>> ------------- >>>>>>> >>>>>>> The mechanism must expose the following functionaility: >>>>>>> >>>>>>> * Job round trip time must be predictable, from submission to >>>>>>> fence signal >>>>>>> >>>>>>> * The mechanism must support compute workloads. >>>>>>> >>>>>>> Goals: >>>>>>> ------ >>>>>>> >>>>>>> * The mechanism should provide low submission latencies >>>>>>> >>>>>>> Test: submitting a NOP packet through the mechanism on busy >>>>>>> hardware >>>>>>> should >>>>>>> be equivalent to submitting a NOP on idle hardware. >>>>>>> >>>>>>> Nice to have: >>>>>>> ------------- >>>>>>> >>>>>>> * The mechanism should also support GFX workloads. >>>>>>> >>>>>>> My understanding is that with the current hardware capabilities in >>>>>>> Polaris10 we >>>>>>> will not be able to provide a solution compatible with GFX >>>>>>> worloads. >>>>>>> >>>>>>> But I would love to hear otherwise. So if anyone has an idea, >>>>>>> approach or >>>>>>> suggestion that will also be compatible with the GFX ring, >>>>>>> please let >>>>>>> us know >>>>>>> about it. >>>>>>> >>>>>>> * The above guarantees should also be respected by amdkfd >>>>>>> workloads >>>>>>> >>>>>>> Would be good to have for consistency, but not strictly >>>>>>> necessary as >>>>>>> users running >>>>>>> games are not traditionally running HPC workloads in the >>>>>>> background. >>>>>>> >>>>>>> Proposed approach: >>>>>>> ------------------ >>>>>>> >>>>>>> Similar to the windows driver, we could expose a high priority >>>>>>> compute queue to >>>>>>> userspace. >>>>>>> >>>>>>> Submissions to this compute queue will be scheduled with high >>>>>>> priority, and may >>>>>>> acquire hardware resources previously in use by other queues. >>>>>>> >>>>>>> This can be achieved by taking advantage of the 'priority' field in >>>>>>> the HQDs >>>>>>> and could be programmed by amdgpu or the amdgpu scheduler. The >>>>>>> relevant >>>>>>> register fields are: >>>>>>> * mmCP_HQD_PIPE_PRIORITY >>>>>>> * mmCP_HQD_QUEUE_PRIORITY >>>>>>> >>>>>>> Implementation approach 1 - static partitioning: >>>>>>> ------------------------------------------------ >>>>>>> >>>>>>> The amdgpu driver currently controls 8 compute queues from >>>>>>> pipe0. We can >>>>>>> statically partition these as follows: >>>>>>> * 7x regular >>>>>>> * 1x high priority >>>>>>> >>>>>>> The relevant priorities can be set so that submissions to the high >>>>>>> priority >>>>>>> ring will starve the other compute rings and the GFX ring. >>>>>>> >>>>>>> The amdgpu scheduler will only place jobs into the high priority >>>>>>> rings if the >>>>>>> context is marked as high priority. And a corresponding priority >>>>>>> should be >>>>>>> added to keep track of this information: >>>>>>> * AMD_SCHED_PRIORITY_KERNEL >>>>>>> * -> AMD_SCHED_PRIORITY_HIGH >>>>>>> * AMD_SCHED_PRIORITY_NORMAL >>>>>>> >>>>>>> The user will request a high priority context by setting an >>>>>>> appropriate flag >>>>>>> in drm_amdgpu_ctx_in (AMDGPU_CTX_HIGH_PRIORITY or similar): >>>>>>> https://github.com/torvalds/linux/blob/master/include/uapi/drm/amdgpu_drm.h#L163 >>>>>>> >>>>>>> >>>>>>> >>>>>>> The setting is in a per context level so that we can: >>>>>>> * Maintain a consistent FIFO ordering of all submissions to a >>>>>>> context >>>>>>> * Create high priority and non-high priority contexts in the >>>>>>> same >>>>>>> process >>>>>>> >>>>>>> Implementation approach 2 - dynamic priority programming: >>>>>>> --------------------------------------------------------- >>>>>>> >>>>>>> Similar to the above, but instead of programming the priorities and >>>>>>> amdgpu_init() time, the SW scheduler will reprogram the queue >>>>>>> priorities >>>>>>> dynamically when scheduling a task. >>>>>>> >>>>>>> This would involve having a hardware specific callback from the >>>>>>> scheduler to >>>>>>> set the appropriate queue priority: set_priority(int ring, int >>>>>>> index, >>>>>>> int priority) >>>>>>> >>>>>>> During this callback we would have to grab the SRBM mutex to >>>>>>> perform >>>>>>> the appropriate >>>>>>> HW programming, and I'm not really sure if that is something we >>>>>>> should be doing from >>>>>>> the scheduler. >>>>>>> >>>>>>> On the positive side, this approach would allow us to program a >>>>>>> range of >>>>>>> priorities for jobs instead of a single "high priority" value", >>>>>>> achieving >>>>>>> something similar to the niceness API available for CPU scheduling. >>>>>>> >>>>>>> I'm not sure if this flexibility is something that we would need >>>>>>> for >>>>>>> our use >>>>>>> case, but it might be useful in other scenarios (multiple users >>>>>>> sharing compute >>>>>>> time on a server). >>>>>>> >>>>>>> This approach would require a new int field in >>>>>>> drm_amdgpu_ctx_in, or >>>>>>> repurposing >>>>>>> of the flags field. >>>>>>> >>>>>>> Known current obstacles: >>>>>>> ------------------------ >>>>>>> >>>>>>> The SQ is currently programmed to disregard the HQD priorities, and >>>>>>> instead it picks >>>>>>> jobs at random. Settings from the shader itself are also >>>>>>> disregarded >>>>>>> as this is >>>>>>> considered a privileged field. >>>>>>> >>>>>>> Effectively we can get our compute wavefront launched ASAP, but we >>>>>>> might not get the >>>>>>> time we need on the SQ. >>>>>>> >>>>>>> The current programming would have to be changed to allow priority >>>>>>> propagation >>>>>>> from the HQD into the SQ. >>>>>>> >>>>>>> Generic approach for all HW IPs: >>>>>>> -------------------------------- >>>>>>> >>>>>>> For consistency purposes, the high priority context can be enabled >>>>>>> for all HW IPs >>>>>>> with support of the SW scheduler. This will function similarly >>>>>>> to the >>>>>>> current >>>>>>> AMD_SCHED_PRIORITY_KERNEL priority, where the job can jump ahead of >>>>>>> anything not >>>>>>> commited to the HW queue. >>>>>>> >>>>>>> The benefits of requesting a high priority context for a >>>>>>> non-compute >>>>>>> queue will >>>>>>> be lesser (e.g. up to 10s of wait time if a GFX command is stuck in >>>>>>> front of >>>>>>> you), but having the API in place will allow us to easily >>>>>>> improve the >>>>>>> implementation >>>>>>> in the future as new features become available in new hardware. >>>>>>> >>>>>>> Future steps: >>>>>>> ------------- >>>>>>> >>>>>>> Once we have an approach settled, I can take care of the >>>>>>> implementation. >>>>>>> >>>>>>> Also, once the interface is mostly decided, we can start >>>>>>> thinking about >>>>>>> exposing the high priority queue through radv. >>>>>>> >>>>>>> Request for feedback: >>>>>>> --------------------- >>>>>>> >>>>>>> We aren't married to any of the approaches outlined above. Our goal >>>>>>> is to >>>>>>> obtain a mechanism that will allow us to complete the reprojection >>>>>>> job within a >>>>>>> predictable amount of time. So if anyone anyone has any >>>>>>> suggestions for >>>>>>> improvements or alternative strategies we are more than happy to >>>>>>> hear >>>>>>> them. >>>>>>> >>>>>>> If any of the technical information above is also incorrect, feel >>>>>>> free to point >>>>>>> out my misunderstandings. >>>>>>> >>>>>>> Looking forward to hearing from you. >>>>>>> >>>>>>> Regards, >>>>>>> Andres >>>>>>> >>>>>>> _______________________________________________ >>>>>>> amd-gfx mailing list >>>>>>> amd-gfx at lists.freedesktop.org >>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>>>>>> >>>>>>> >>>>>>> amd-gfx Info Page - lists.freedesktop.org >>>>>>> lists.freedesktop.org >>>>>>> To see the collection of prior postings to the list, visit the >>>>>>> amd-gfx Archives. Using amd-gfx: To post a message to all the list >>>>>>> members, send email ... >>>>>>> >>>>>>> >>>>>>> >>>>>>> amd-gfx Info Page - lists.freedesktop.org >>>>>>> lists.freedesktop.org >>>>>>> To see the collection of prior postings to the list, visit the >>>>>>> amd-gfx Archives. Using amd-gfx: To post a message to all the list >>>>>>> members, send email ... >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> amd-gfx mailing list >>>>>>> amd-gfx at lists.freedesktop.org >>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> amd-gfx mailing list >>>>>> amd-gfx at lists.freedesktop.org >>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>>>> >>>> >>> >>> _______________________________________________ >>> amd-gfx mailing list >>> amd-gfx at lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >> > Sincerely yours, Serguei Sagalovitch