Hi Christian, That is definitely a concern. What we are currently thinking is to make the high priority queues accessible to root only. Therefore is a non-root user attempts to set the high priority flag on context allocation, we would fail the call and return ENOPERM. Regards, Andres On 12/20/2016 7:56 AM, Christian König wrote: >> BTW: If there is non-VR application which will use high-priority >> h/w queue then VR application will suffer. Any ideas how >> to solve it? > Yeah, that problem came to my mind as well. > > Basically we need to restrict those high priority submissions to the > VR compositor or otherwise any malfunctioning application could use it. > > Just think about some WebGL suddenly taking all our rendering away and > we won't get anything drawn any more. > > Alex or Michel any ideas on that? > > Regards, > Christian. > > Am 19.12.2016 um 15:48 schrieb Serguei Sagalovitch: >> > If compute queue is occupied only by you, the efficiency >> > is equal with setting job queue to high priority I think. >> The only risk is the situation when graphics will take all >> needed CUs. But in any case it should be very good test. >> >> Andres/Pierre-Loup, >> >> Did you try to do it or it is a lot of work for you? >> >> >> BTW: If there is non-VR application which will use high-priority >> h/w queue then VR application will suffer. Any ideas how >> to solve it? >> >> Sincerely yours, >> Serguei Sagalovitch >> >> On 2016-12-19 12:50 AM, zhoucm1 wrote: >>> Do you encounter the priority issue for compute queue with current >>> driver? >>> >>> If compute queue is occupied only by you, the efficiency is equal >>> with setting job queue to high priority I think. >>> >>> Regards, >>> David Zhou >>> >>> On 2016å¹´12æ??19æ?¥ 13:29, Andres Rodriguez wrote: >>>> Yes, vulkan is available on all-open through the mesa radv UMD. >>>> >>>> I'm not sure if I'm asking for too much, but if we can coordinate a >>>> similar interface in radv and amdgpu-pro at the vulkan level that >>>> would be great. >>>> >>>> I'm not sure what that's going to be yet. >>>> >>>> - Andres >>>> >>>> On 12/19/2016 12:11 AM, zhoucm1 wrote: >>>>> >>>>> >>>>> On 2016å¹´12æ??19æ?¥ 11:33, Pierre-Loup A. Griffais wrote: >>>>>> We're currently working with the open stack; I assume that a >>>>>> mechanism could be exposed by both open and Pro Vulkan userspace >>>>>> drivers and that the amdgpu kernel interface improvements we >>>>>> would pursue following this discussion would let both drivers >>>>>> take advantage of the feature, correct? >>>>> Of course. >>>>> Does open stack have Vulkan support? >>>>> >>>>> Regards, >>>>> David Zhou >>>>>> >>>>>> On 12/18/2016 07:26 PM, zhoucm1 wrote: >>>>>>> By the way, are you using all-open driver or amdgpu-pro driver? >>>>>>> >>>>>>> +David Mao, who is working on our Vulkan driver. >>>>>>> >>>>>>> Regards, >>>>>>> David Zhou >>>>>>> >>>>>>> On 2016å¹´12æ??18æ?¥ 06:05, Pierre-Loup A. Griffais wrote: >>>>>>>> Hi Serguei, >>>>>>>> >>>>>>>> I'm also working on the bringing up our VR runtime on top of >>>>>>>> amgpu; >>>>>>>> see replies inline. >>>>>>>> >>>>>>>> On 12/16/2016 09:05 PM, Sagalovitch, Serguei wrote: >>>>>>>>> Andres, >>>>>>>>> >>>>>>>>>> For current VR workloads we have 3 separate processes running >>>>>>>>>> actually: >>>>>>>>> So we could have potential memory overcommit case or do you do >>>>>>>>> partitioning >>>>>>>>> on your own? I would think that there is need to avoid >>>>>>>>> overcomit in >>>>>>>>> VR case to >>>>>>>>> prevent any BO migration. >>>>>>>> >>>>>>>> You're entirely correct; currently the VR runtime is setting up >>>>>>>> prioritized CPU scheduling for its VR compositor, we're working on >>>>>>>> prioritized GPU scheduling and pre-emption (eg. this thread), >>>>>>>> and in >>>>>>>> the future it will make sense to do work in order to make sure >>>>>>>> that >>>>>>>> its memory allocations do not get evicted, to prevent any >>>>>>>> unwelcome >>>>>>>> additional latency in the event of needing to perform just-in-time >>>>>>>> reprojection. >>>>>>>> >>>>>>>>> BTW: Do you mean __real__ processes or threads? >>>>>>>>> Based on my understanding sharing BOs between different processes >>>>>>>>> could introduce additional synchronization constrains. btw: I >>>>>>>>> am not >>>>>>>>> sure >>>>>>>>> if we are able to share Vulkan sync. object cross-process >>>>>>>>> boundary. >>>>>>>> >>>>>>>> They are different processes; it is important for the >>>>>>>> compositor that >>>>>>>> is responsible for quality-of-service features such as >>>>>>>> consistently >>>>>>>> presenting distorted frames with the right latency, >>>>>>>> reprojection, etc, >>>>>>>> to be separate from the main application. >>>>>>>> >>>>>>>> Currently we are using unreleased cross-process memory and >>>>>>>> semaphore >>>>>>>> extensions to fetch updated eye images from the client >>>>>>>> application, >>>>>>>> but the just-in-time reprojection discussed here does not actually >>>>>>>> have any direct interactions with cross-process resource sharing, >>>>>>>> since it's achieved by using whatever is the latest, most >>>>>>>> up-to-date >>>>>>>> eye images that have already been sent by the client application, >>>>>>>> which are already available to use without additional >>>>>>>> synchronization. >>>>>>>> >>>>>>>>> >>>>>>>>>> 3) System compositor (we are looking at approaches to >>>>>>>>>> remove this >>>>>>>>>> overhead) >>>>>>>>> Yes, IMHO the best is to run in "full screen mode". >>>>>>>> >>>>>>>> Yes, we are working on mechanisms to present directly to the >>>>>>>> headset >>>>>>>> display without any intermediaries as a separate effort. >>>>>>>> >>>>>>>>> >>>>>>>>>> The latency is our main concern, >>>>>>>>> I would assume that this is the known problem (at least for >>>>>>>>> compute >>>>>>>>> usage). >>>>>>>>> It looks like that amdgpu / kernel submission is rather CPU >>>>>>>>> intensive >>>>>>>>> (at least >>>>>>>>> in the default configuration). >>>>>>>> >>>>>>>> As long as it's a consistent cost, it shouldn't an issue. >>>>>>>> However, if >>>>>>>> there's high degrees of variance then that would be troublesome >>>>>>>> and we >>>>>>>> would need to account for the worst case. >>>>>>>> >>>>>>>> Hopefully the requirements and approach we described make >>>>>>>> sense, we're >>>>>>>> looking forward to your feedback and suggestions. >>>>>>>> >>>>>>>> Thanks! >>>>>>>> - Pierre-Loup >>>>>>>> >>>>>>>>> >>>>>>>>> Sincerely yours, >>>>>>>>> Serguei Sagalovitch >>>>>>>>> >>>>>>>>> >>>>>>>>> From: Andres Rodriguez <andresr at valvesoftware.com> >>>>>>>>> Sent: December 16, 2016 10:00 PM >>>>>>>>> To: Sagalovitch, Serguei; amd-gfx at lists.freedesktop.org >>>>>>>>> Subject: RE: [RFC] Mechanism for high priority scheduling in >>>>>>>>> amdgpu >>>>>>>>> >>>>>>>>> Hey Serguei, >>>>>>>>> >>>>>>>>>> [Serguei] No. I mean pipe :-) as MEC define it. As far as I >>>>>>>>>> understand (by simplifying) >>>>>>>>>> some scheduling is per pipe. I know about the current >>>>>>>>>> allocation >>>>>>>>>> scheme but I do not think >>>>>>>>>> that it is ideal. I would assume that we need to switch to >>>>>>>>>> dynamical partition >>>>>>>>>> of resources based on the workload otherwise we will have >>>>>>>>>> resource >>>>>>>>>> conflict >>>>>>>>>> between Vulkan compute and OpenCL. >>>>>>>>> >>>>>>>>> I agree the partitioning isn't ideal. I'm hoping we can start >>>>>>>>> with a >>>>>>>>> solution that assumes that >>>>>>>>> only pipe0 has any work and the other pipes are idle (no HSA/ROCm >>>>>>>>> running on the system). >>>>>>>>> >>>>>>>>> This should be more or less the use case we expect from VR users. >>>>>>>>> >>>>>>>>> I agree the split is currently not ideal, but I'd like to >>>>>>>>> consider >>>>>>>>> that a separate task, because >>>>>>>>> making it dynamic is not straight forward :P >>>>>>>>> >>>>>>>>>> [Serguei] Vulkan works via amdgpu (kernel submissions) so amdkfd >>>>>>>>>> will be not >>>>>>>>>> involved. I would assume that in the case of VR we will have >>>>>>>>>> one main >>>>>>>>>> application ("console" mode(?)) so we could temporally "ignore" >>>>>>>>>> OpenCL/ROCm needs when VR is running. >>>>>>>>> >>>>>>>>> Correct, this is why we want to enable the high priority compute >>>>>>>>> queue through >>>>>>>>> libdrm-amdgpu, so that we can expose it through Vulkan later. >>>>>>>>> >>>>>>>>> For current VR workloads we have 3 separate processes running >>>>>>>>> actually: >>>>>>>>> 1) Game process >>>>>>>>> 2) VR Compositor (this is the process that will require high >>>>>>>>> priority queue) >>>>>>>>> 3) System compositor (we are looking at approaches to >>>>>>>>> remove this >>>>>>>>> overhead) >>>>>>>>> >>>>>>>>> For now I think it is okay to assume no OpenCL/ROCm running >>>>>>>>> simultaneously, but >>>>>>>>> I would also like to be able to address this case in the future >>>>>>>>> (cross-pipe priorities). >>>>>>>>> >>>>>>>>>> [Serguei] The problem with pre-emption of graphics task: >>>>>>>>>> (a) it >>>>>>>>>> may take time so >>>>>>>>>> latency may suffer >>>>>>>>> >>>>>>>>> The latency is our main concern, we want something that is >>>>>>>>> predictable. A good >>>>>>>>> illustration of what the reprojection scheduling looks like >>>>>>>>> can be >>>>>>>>> found here: >>>>>>>>> https://community.amd.com/servlet/JiveServlet/showImage/38-1310-104754/pastedImage_3.png >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> (b) to preempt we need to have different "context" - we want >>>>>>>>>> to guarantee that submissions from the same context will be >>>>>>>>>> executed >>>>>>>>>> in order. >>>>>>>>> >>>>>>>>> This is okay, as the reprojection work doesn't have >>>>>>>>> dependencies on >>>>>>>>> the game context, and it >>>>>>>>> even happens in a separate process. >>>>>>>>> >>>>>>>>>> BTW: (a) Do you want "preempt" and later resume or do you want >>>>>>>>>> "preempt" and >>>>>>>>>> "cancel/abort" >>>>>>>>> >>>>>>>>> Preempt the game with the compositor task and then resume it. >>>>>>>>> >>>>>>>>>> (b) Vulkan is generic API and could be used for graphics as >>>>>>>>>> well as >>>>>>>>>> for plain compute tasks (VK_QUEUE_COMPUTE_BIT). >>>>>>>>> >>>>>>>>> Yeah, the plan is to use vulkan compute. But if you figure out >>>>>>>>> a way >>>>>>>>> for us to get >>>>>>>>> a guaranteed execution time using vulkan graphics, then I'll >>>>>>>>> take you >>>>>>>>> out for a beer :) >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Andres >>>>>>>>> ________________________________________ >>>>>>>>> From: Sagalovitch, Serguei [Serguei.Sagalovitch at amd.com] >>>>>>>>> Sent: Friday, December 16, 2016 9:13 PM >>>>>>>>> To: Andres Rodriguez; amd-gfx at lists.freedesktop.org >>>>>>>>> Subject: Re: [RFC] Mechanism for high priority scheduling in >>>>>>>>> amdgpu >>>>>>>>> >>>>>>>>> Hi Andres, >>>>>>>>> >>>>>>>>> Please see inline (as [Serguei]) >>>>>>>>> >>>>>>>>> Sincerely yours, >>>>>>>>> Serguei Sagalovitch >>>>>>>>> >>>>>>>>> >>>>>>>>> From: Andres Rodriguez <andresr at valvesoftware.com> >>>>>>>>> Sent: December 16, 2016 8:29 PM >>>>>>>>> To: Sagalovitch, Serguei; amd-gfx at lists.freedesktop.org >>>>>>>>> Subject: RE: [RFC] Mechanism for high priority scheduling in >>>>>>>>> amdgpu >>>>>>>>> >>>>>>>>> Hi Serguei, >>>>>>>>> >>>>>>>>> Thanks for the feedback. Answers inline as [AR]. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Andres >>>>>>>>> >>>>>>>>> ________________________________________ >>>>>>>>> From: Sagalovitch, Serguei [Serguei.Sagalovitch at amd.com] >>>>>>>>> Sent: Friday, December 16, 2016 8:15 PM >>>>>>>>> To: Andres Rodriguez; amd-gfx at lists.freedesktop.org >>>>>>>>> Subject: Re: [RFC] Mechanism for high priority scheduling in >>>>>>>>> amdgpu >>>>>>>>> >>>>>>>>> Andres, >>>>>>>>> >>>>>>>>> >>>>>>>>> Quick comments: >>>>>>>>> >>>>>>>>> 1) To minimize "bubbles", etc. we need to "force" CU >>>>>>>>> assignments/binding >>>>>>>>> to high-priority queue when it will be in use and "free" them >>>>>>>>> later >>>>>>>>> (we do not want forever take CUs from e.g. graphic task to >>>>>>>>> degrade >>>>>>>>> graphics >>>>>>>>> performance). >>>>>>>>> >>>>>>>>> Otherwise we could have scenario when long graphics task (or >>>>>>>>> low-priority >>>>>>>>> compute) will took all (extra) CUs and high--priority will >>>>>>>>> wait for >>>>>>>>> needed resources. >>>>>>>>> It will not be visible on "NOP " but only when you submit "real" >>>>>>>>> compute task >>>>>>>>> so I would recommend not to use "NOP" packets at all for >>>>>>>>> testing. >>>>>>>>> >>>>>>>>> It (CU assignment) could be relatively easy done when >>>>>>>>> everything is >>>>>>>>> going via kernel >>>>>>>>> (e.g. as part of frame submission) but I must admit that I am >>>>>>>>> not sure >>>>>>>>> about the best way for user level submissions (amdkfd). >>>>>>>>> >>>>>>>>> [AR] I wasn't aware of this part of the programming sequence. >>>>>>>>> Thanks >>>>>>>>> for the heads up! >>>>>>>>> Is this similar to the CU masking programming? >>>>>>>>> [Serguei] Yes. To simplify: the problem is that "scheduler" when >>>>>>>>> deciding which >>>>>>>>> queue to run will check if there is enough resources and if >>>>>>>>> not then >>>>>>>>> it will begin >>>>>>>>> to check other queues with lower priority. >>>>>>>>> >>>>>>>>> 2) I would recommend to dedicate the whole pipe to high-priority >>>>>>>>> queue and have >>>>>>>>> nothing their except it. >>>>>>>>> >>>>>>>>> [AR] I'm guessing in this context you mean pipe = queue? (as >>>>>>>>> opposed >>>>>>>>> to the MEC definition >>>>>>>>> of pipe, which is a grouping of queues). I say this because >>>>>>>>> amdgpu >>>>>>>>> only has access to 1 pipe, >>>>>>>>> and the rest are statically partitioned for amdkfd usage. >>>>>>>>> >>>>>>>>> [Serguei] No. I mean pipe :-) as MEC define it. As far as I >>>>>>>>> understand (by simplifying) >>>>>>>>> some scheduling is per pipe. I know about the current allocation >>>>>>>>> scheme but I do not think >>>>>>>>> that it is ideal. I would assume that we need to switch to >>>>>>>>> dynamical partition >>>>>>>>> of resources based on the workload otherwise we will have >>>>>>>>> resource >>>>>>>>> conflict >>>>>>>>> between Vulkan compute and OpenCL. >>>>>>>>> >>>>>>>>> >>>>>>>>> BTW: Which user level API do you want to use for compute: >>>>>>>>> Vulkan or >>>>>>>>> OpenCL? >>>>>>>>> >>>>>>>>> [AR] Vulkan >>>>>>>>> >>>>>>>>> [Serguei] Vulkan works via amdgpu (kernel submissions) so >>>>>>>>> amdkfd will >>>>>>>>> be not >>>>>>>>> involved. I would assume that in the case of VR we will have >>>>>>>>> one main >>>>>>>>> application ("console" mode(?)) so we could temporally "ignore" >>>>>>>>> OpenCL/ROCm needs when VR is running. >>>>>>>>> >>>>>>>>>> we will not be able to provide a solution compatible with GFX >>>>>>>>>> worloads. >>>>>>>>> I assume that you are talking about graphics? Am I right? >>>>>>>>> >>>>>>>>> [AR] Yeah, my understanding is that pre-empting the currently >>>>>>>>> running >>>>>>>>> graphics job and scheduling in >>>>>>>>> something else using mid-buffer pre-emption has some cases >>>>>>>>> where it >>>>>>>>> doesn't work well. But if with >>>>>>>>> polaris10 it starts working well, it might be a better >>>>>>>>> solution for >>>>>>>>> us (because the whole reprojection >>>>>>>>> work uses the vulkan graphics stack at the moment, and porting >>>>>>>>> it to >>>>>>>>> compute is not trivial). >>>>>>>>> >>>>>>>>> [Serguei] The problem with pre-emption of graphics task: (a) >>>>>>>>> it may >>>>>>>>> take time so >>>>>>>>> latency may suffer (b) to preempt we need to have different >>>>>>>>> "context" >>>>>>>>> - we want >>>>>>>>> to guarantee that submissions from the same context will be >>>>>>>>> executed >>>>>>>>> in order. >>>>>>>>> BTW: (a) Do you want "preempt" and later resume or do you want >>>>>>>>> "preempt" and >>>>>>>>> "cancel/abort"? (b) Vulkan is generic API and could be used >>>>>>>>> for graphics as well as for plain compute tasks >>>>>>>>> (VK_QUEUE_COMPUTE_BIT). >>>>>>>>> >>>>>>>>> >>>>>>>>> Sincerely yours, >>>>>>>>> Serguei Sagalovitch >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> on >>>>>>>>> behalf of >>>>>>>>> Andres Rodriguez <andresr at valvesoftware.com> >>>>>>>>> Sent: December 16, 2016 6:15 PM >>>>>>>>> To: amd-gfx at lists.freedesktop.org >>>>>>>>> Subject: [RFC] Mechanism for high priority scheduling in amdgpu >>>>>>>>> >>>>>>>>> Hi Everyone, >>>>>>>>> >>>>>>>>> This RFC is also available as a gist here: >>>>>>>>> https://gist.github.com/lostgoat/7000432cd6864265dbc2c3ab93204249 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> [RFC] Mechanism for high priority scheduling in amdgpu >>>>>>>>> gist.github.com >>>>>>>>> [RFC] Mechanism for high priority scheduling in amdgpu >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> [RFC] Mechanism for high priority scheduling in amdgpu >>>>>>>>> gist.github.com >>>>>>>>> [RFC] Mechanism for high priority scheduling in amdgpu >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> [RFC] Mechanism for high priority scheduling in amdgpu >>>>>>>>> gist.github.com >>>>>>>>> [RFC] Mechanism for high priority scheduling in amdgpu >>>>>>>>> >>>>>>>>> >>>>>>>>> We are interested in feedback for a mechanism to effectively >>>>>>>>> schedule >>>>>>>>> high >>>>>>>>> priority VR reprojection tasks (also referred to as >>>>>>>>> time-warping) for >>>>>>>>> Polaris10 >>>>>>>>> running on the amdgpu kernel driver. >>>>>>>>> >>>>>>>>> Brief context: >>>>>>>>> -------------- >>>>>>>>> >>>>>>>>> The main objective of reprojection is to avoid motion sickness >>>>>>>>> for VR >>>>>>>>> users in >>>>>>>>> scenarios where the game or application would fail to finish >>>>>>>>> rendering a new >>>>>>>>> frame in time for the next VBLANK. When this happens, the >>>>>>>>> user's head >>>>>>>>> movements >>>>>>>>> are not reflected on the Head Mounted Display (HMD) for the >>>>>>>>> duration >>>>>>>>> of an >>>>>>>>> extra frame. This extended mismatch between the inner ear and the >>>>>>>>> eyes may >>>>>>>>> cause the user to experience motion sickness. >>>>>>>>> >>>>>>>>> The VR compositor deals with this problem by fabricating a new >>>>>>>>> frame >>>>>>>>> using the >>>>>>>>> user's updated head position in combination with the previous >>>>>>>>> frames. >>>>>>>>> This >>>>>>>>> avoids a prolonged mismatch between the HMD output and the >>>>>>>>> inner ear. >>>>>>>>> >>>>>>>>> Because of the adverse effects on the user, we require high >>>>>>>>> confidence that the >>>>>>>>> reprojection task will complete before the VBLANK interval. >>>>>>>>> Even if >>>>>>>>> the GFX pipe >>>>>>>>> is currently full of work from the game/application (which is >>>>>>>>> most >>>>>>>>> likely the case). >>>>>>>>> >>>>>>>>> For more details and illustrations, please refer to the following >>>>>>>>> document: >>>>>>>>> https://community.amd.com/community/gaming/blog/2016/03/28/asynchronous-shaders-evolved >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Gaming: Asynchronous Shaders Evolved | Community >>>>>>>>> community.amd.com >>>>>>>>> One of the most exciting new developments in GPU technology >>>>>>>>> over the >>>>>>>>> past year has been the adoption of asynchronous shaders, which >>>>>>>>> can >>>>>>>>> make more efficient use of ... >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Gaming: Asynchronous Shaders Evolved | Community >>>>>>>>> community.amd.com >>>>>>>>> One of the most exciting new developments in GPU technology >>>>>>>>> over the >>>>>>>>> past year has been the adoption of asynchronous shaders, which >>>>>>>>> can >>>>>>>>> make more efficient use of ... >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Gaming: Asynchronous Shaders Evolved | Community >>>>>>>>> community.amd.com >>>>>>>>> One of the most exciting new developments in GPU technology >>>>>>>>> over the >>>>>>>>> past year has been the adoption of asynchronous shaders, which >>>>>>>>> can >>>>>>>>> make more efficient use of ... >>>>>>>>> >>>>>>>>> >>>>>>>>> Requirements: >>>>>>>>> ------------- >>>>>>>>> >>>>>>>>> The mechanism must expose the following functionaility: >>>>>>>>> >>>>>>>>> * Job round trip time must be predictable, from submission to >>>>>>>>> fence signal >>>>>>>>> >>>>>>>>> * The mechanism must support compute workloads. >>>>>>>>> >>>>>>>>> Goals: >>>>>>>>> ------ >>>>>>>>> >>>>>>>>> * The mechanism should provide low submission latencies >>>>>>>>> >>>>>>>>> Test: submitting a NOP packet through the mechanism on busy >>>>>>>>> hardware >>>>>>>>> should >>>>>>>>> be equivalent to submitting a NOP on idle hardware. >>>>>>>>> >>>>>>>>> Nice to have: >>>>>>>>> ------------- >>>>>>>>> >>>>>>>>> * The mechanism should also support GFX workloads. >>>>>>>>> >>>>>>>>> My understanding is that with the current hardware >>>>>>>>> capabilities in >>>>>>>>> Polaris10 we >>>>>>>>> will not be able to provide a solution compatible with GFX >>>>>>>>> worloads. >>>>>>>>> >>>>>>>>> But I would love to hear otherwise. So if anyone has an idea, >>>>>>>>> approach or >>>>>>>>> suggestion that will also be compatible with the GFX ring, >>>>>>>>> please let >>>>>>>>> us know >>>>>>>>> about it. >>>>>>>>> >>>>>>>>> * The above guarantees should also be respected by amdkfd >>>>>>>>> workloads >>>>>>>>> >>>>>>>>> Would be good to have for consistency, but not strictly >>>>>>>>> necessary as >>>>>>>>> users running >>>>>>>>> games are not traditionally running HPC workloads in the >>>>>>>>> background. >>>>>>>>> >>>>>>>>> Proposed approach: >>>>>>>>> ------------------ >>>>>>>>> >>>>>>>>> Similar to the windows driver, we could expose a high priority >>>>>>>>> compute queue to >>>>>>>>> userspace. >>>>>>>>> >>>>>>>>> Submissions to this compute queue will be scheduled with high >>>>>>>>> priority, and may >>>>>>>>> acquire hardware resources previously in use by other queues. >>>>>>>>> >>>>>>>>> This can be achieved by taking advantage of the 'priority' >>>>>>>>> field in >>>>>>>>> the HQDs >>>>>>>>> and could be programmed by amdgpu or the amdgpu scheduler. The >>>>>>>>> relevant >>>>>>>>> register fields are: >>>>>>>>> * mmCP_HQD_PIPE_PRIORITY >>>>>>>>> * mmCP_HQD_QUEUE_PRIORITY >>>>>>>>> >>>>>>>>> Implementation approach 1 - static partitioning: >>>>>>>>> ------------------------------------------------ >>>>>>>>> >>>>>>>>> The amdgpu driver currently controls 8 compute queues from >>>>>>>>> pipe0. We can >>>>>>>>> statically partition these as follows: >>>>>>>>> * 7x regular >>>>>>>>> * 1x high priority >>>>>>>>> >>>>>>>>> The relevant priorities can be set so that submissions to the >>>>>>>>> high >>>>>>>>> priority >>>>>>>>> ring will starve the other compute rings and the GFX ring. >>>>>>>>> >>>>>>>>> The amdgpu scheduler will only place jobs into the high priority >>>>>>>>> rings if the >>>>>>>>> context is marked as high priority. And a corresponding priority >>>>>>>>> should be >>>>>>>>> added to keep track of this information: >>>>>>>>> * AMD_SCHED_PRIORITY_KERNEL >>>>>>>>> * -> AMD_SCHED_PRIORITY_HIGH >>>>>>>>> * AMD_SCHED_PRIORITY_NORMAL >>>>>>>>> >>>>>>>>> The user will request a high priority context by setting an >>>>>>>>> appropriate flag >>>>>>>>> in drm_amdgpu_ctx_in (AMDGPU_CTX_HIGH_PRIORITY or similar): >>>>>>>>> https://github.com/torvalds/linux/blob/master/include/uapi/drm/amdgpu_drm.h#L163 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> The setting is in a per context level so that we can: >>>>>>>>> * Maintain a consistent FIFO ordering of all submissions to a >>>>>>>>> context >>>>>>>>> * Create high priority and non-high priority contexts in >>>>>>>>> the same >>>>>>>>> process >>>>>>>>> >>>>>>>>> Implementation approach 2 - dynamic priority programming: >>>>>>>>> --------------------------------------------------------- >>>>>>>>> >>>>>>>>> Similar to the above, but instead of programming the >>>>>>>>> priorities and >>>>>>>>> amdgpu_init() time, the SW scheduler will reprogram the queue >>>>>>>>> priorities >>>>>>>>> dynamically when scheduling a task. >>>>>>>>> >>>>>>>>> This would involve having a hardware specific callback from the >>>>>>>>> scheduler to >>>>>>>>> set the appropriate queue priority: set_priority(int ring, int >>>>>>>>> index, >>>>>>>>> int priority) >>>>>>>>> >>>>>>>>> During this callback we would have to grab the SRBM mutex to >>>>>>>>> perform >>>>>>>>> the appropriate >>>>>>>>> HW programming, and I'm not really sure if that is something we >>>>>>>>> should be doing from >>>>>>>>> the scheduler. >>>>>>>>> >>>>>>>>> On the positive side, this approach would allow us to program >>>>>>>>> a range of >>>>>>>>> priorities for jobs instead of a single "high priority" value", >>>>>>>>> achieving >>>>>>>>> something similar to the niceness API available for CPU >>>>>>>>> scheduling. >>>>>>>>> >>>>>>>>> I'm not sure if this flexibility is something that we would >>>>>>>>> need for >>>>>>>>> our use >>>>>>>>> case, but it might be useful in other scenarios (multiple users >>>>>>>>> sharing compute >>>>>>>>> time on a server). >>>>>>>>> >>>>>>>>> This approach would require a new int field in >>>>>>>>> drm_amdgpu_ctx_in, or >>>>>>>>> repurposing >>>>>>>>> of the flags field. >>>>>>>>> >>>>>>>>> Known current obstacles: >>>>>>>>> ------------------------ >>>>>>>>> >>>>>>>>> The SQ is currently programmed to disregard the HQD >>>>>>>>> priorities, and >>>>>>>>> instead it picks >>>>>>>>> jobs at random. Settings from the shader itself are also >>>>>>>>> disregarded >>>>>>>>> as this is >>>>>>>>> considered a privileged field. >>>>>>>>> >>>>>>>>> Effectively we can get our compute wavefront launched ASAP, >>>>>>>>> but we >>>>>>>>> might not get the >>>>>>>>> time we need on the SQ. >>>>>>>>> >>>>>>>>> The current programming would have to be changed to allow >>>>>>>>> priority >>>>>>>>> propagation >>>>>>>>> from the HQD into the SQ. >>>>>>>>> >>>>>>>>> Generic approach for all HW IPs: >>>>>>>>> -------------------------------- >>>>>>>>> >>>>>>>>> For consistency purposes, the high priority context can be >>>>>>>>> enabled >>>>>>>>> for all HW IPs >>>>>>>>> with support of the SW scheduler. This will function similarly >>>>>>>>> to the >>>>>>>>> current >>>>>>>>> AMD_SCHED_PRIORITY_KERNEL priority, where the job can jump >>>>>>>>> ahead of >>>>>>>>> anything not >>>>>>>>> commited to the HW queue. >>>>>>>>> >>>>>>>>> The benefits of requesting a high priority context for a >>>>>>>>> non-compute >>>>>>>>> queue will >>>>>>>>> be lesser (e.g. up to 10s of wait time if a GFX command is >>>>>>>>> stuck in >>>>>>>>> front of >>>>>>>>> you), but having the API in place will allow us to easily >>>>>>>>> improve the >>>>>>>>> implementation >>>>>>>>> in the future as new features become available in new hardware. >>>>>>>>> >>>>>>>>> Future steps: >>>>>>>>> ------------- >>>>>>>>> >>>>>>>>> Once we have an approach settled, I can take care of the >>>>>>>>> implementation. >>>>>>>>> >>>>>>>>> Also, once the interface is mostly decided, we can start >>>>>>>>> thinking about >>>>>>>>> exposing the high priority queue through radv. >>>>>>>>> >>>>>>>>> Request for feedback: >>>>>>>>> --------------------- >>>>>>>>> >>>>>>>>> We aren't married to any of the approaches outlined above. Our >>>>>>>>> goal >>>>>>>>> is to >>>>>>>>> obtain a mechanism that will allow us to complete the >>>>>>>>> reprojection >>>>>>>>> job within a >>>>>>>>> predictable amount of time. So if anyone anyone has any >>>>>>>>> suggestions for >>>>>>>>> improvements or alternative strategies we are more than happy >>>>>>>>> to hear >>>>>>>>> them. >>>>>>>>> >>>>>>>>> If any of the technical information above is also incorrect, feel >>>>>>>>> free to point >>>>>>>>> out my misunderstandings. >>>>>>>>> >>>>>>>>> Looking forward to hearing from you. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Andres >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> amd-gfx mailing list >>>>>>>>> amd-gfx at lists.freedesktop.org >>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>>>>>>>> >>>>>>>>> >>>>>>>>> amd-gfx Info Page - lists.freedesktop.org >>>>>>>>> lists.freedesktop.org >>>>>>>>> To see the collection of prior postings to the list, visit the >>>>>>>>> amd-gfx Archives. Using amd-gfx: To post a message to all the >>>>>>>>> list >>>>>>>>> members, send email ... >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> amd-gfx Info Page - lists.freedesktop.org >>>>>>>>> lists.freedesktop.org >>>>>>>>> To see the collection of prior postings to the list, visit the >>>>>>>>> amd-gfx Archives. Using amd-gfx: To post a message to all the >>>>>>>>> list >>>>>>>>> members, send email ... >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> amd-gfx mailing list >>>>>>>>> amd-gfx at lists.freedesktop.org >>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> amd-gfx mailing list >>>>>>>> amd-gfx at lists.freedesktop.org >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> amd-gfx mailing list >>>>> amd-gfx at lists.freedesktop.org >>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>>> >>> >> >> Sincerely yours, >> Serguei Sagalovitch >> >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx at lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > >