[RFC] Mechanism for high priority scheduling in amdgpu

serguei.sagalovitch@xxxxxxx (Serguei Sagalovitch) · Mon, 19 Dec 2016 09:48:26 -0500

 > If compute queue is occupied only by you, the efficiency
 > is equal with setting job queue to high priority I think.
The only risk is the situation when graphics will take all
needed CUs. But in any case it should be very good test.

Andres/Pierre-Loup,

Did you try to do it or it is a lot of work for you?

BTW: If there is  non-VR application which will use high-priority
h/w queue then VR application will suffer.  Any ideas how
to solve it?

Sincerely yours,
Serguei Sagalovitch

On 2016-12-19 12:50 AM, zhoucm1 wrote:
> Do you encounter the priority issue for compute queue with current 
> driver?
>
> If compute queue is occupied only by you, the efficiency is equal with 
> setting job queue to high priority I think.
>
> Regards,
> David Zhou
>
> On 2016å¹´12æ??19æ?¥ 13:29, Andres Rodriguez wrote:
>> Yes, vulkan is available on all-open through the mesa radv UMD.
>>
>> I'm not sure if I'm asking for too much, but if we can coordinate a 
>> similar interface in radv and amdgpu-pro at the vulkan level that 
>> would be great.
>>
>> I'm not sure what that's going to be yet.
>>
>> - Andres
>>
>> On 12/19/2016 12:11 AM, zhoucm1 wrote:
>>>
>>>
>>> On 2016å¹´12æ??19æ?¥ 11:33, Pierre-Loup A. Griffais wrote:
>>>> We're currently working with the open stack; I assume that a 
>>>> mechanism could be exposed by both open and Pro Vulkan userspace 
>>>> drivers and that the amdgpu kernel interface improvements we would 
>>>> pursue following this discussion would let both drivers take 
>>>> advantage of the feature, correct?
>>> Of course.
>>> Does open stack have Vulkan support?
>>>
>>> Regards,
>>> David Zhou
>>>>
>>>> On 12/18/2016 07:26 PM, zhoucm1 wrote:
>>>>> By the way, are you using all-open driver or amdgpu-pro driver?
>>>>>
>>>>> +David Mao, who is working on our Vulkan driver.
>>>>>
>>>>> Regards,
>>>>> David Zhou
>>>>>
>>>>> On 2016å¹´12æ??18æ?¥ 06:05, Pierre-Loup A. Griffais wrote:
>>>>>> Hi Serguei,
>>>>>>
>>>>>> I'm also working on the bringing up our VR runtime on top of amgpu;
>>>>>> see replies inline.
>>>>>>
>>>>>> On 12/16/2016 09:05 PM, Sagalovitch, Serguei wrote:
>>>>>>> Andres,
>>>>>>>
>>>>>>>>  For current VR workloads we have 3 separate processes running
>>>>>>>> actually:
>>>>>>> So we could have potential memory overcommit case or do you do
>>>>>>> partitioning
>>>>>>> on your own?  I would think that there is need to avoid 
>>>>>>> overcomit in
>>>>>>> VR case to
>>>>>>> prevent any BO migration.
>>>>>>
>>>>>> You're entirely correct; currently the VR runtime is setting up
>>>>>> prioritized CPU scheduling for its VR compositor, we're working on
>>>>>> prioritized GPU scheduling and pre-emption (eg. this thread), and in
>>>>>> the future it will make sense to do work in order to make sure that
>>>>>> its memory allocations do not get evicted, to prevent any unwelcome
>>>>>> additional latency in the event of needing to perform just-in-time
>>>>>> reprojection.
>>>>>>
>>>>>>> BTW: Do you mean __real__ processes or threads?
>>>>>>> Based on my understanding sharing BOs between different processes
>>>>>>> could introduce additional synchronization constrains. btw: I am 
>>>>>>> not
>>>>>>> sure
>>>>>>> if we are able to share Vulkan sync. object cross-process boundary.
>>>>>>
>>>>>> They are different processes; it is important for the compositor 
>>>>>> that
>>>>>> is responsible for quality-of-service features such as consistently
>>>>>> presenting distorted frames with the right latency, reprojection, 
>>>>>> etc,
>>>>>> to be separate from the main application.
>>>>>>
>>>>>> Currently we are using unreleased cross-process memory and semaphore
>>>>>> extensions to fetch updated eye images from the client application,
>>>>>> but the just-in-time reprojection discussed here does not actually
>>>>>> have any direct interactions with cross-process resource sharing,
>>>>>> since it's achieved by using whatever is the latest, most up-to-date
>>>>>> eye images that have already been sent by the client application,
>>>>>> which are already available to use without additional 
>>>>>> synchronization.
>>>>>>
>>>>>>>
>>>>>>>>    3) System compositor (we are looking at approaches to remove 
>>>>>>>> this
>>>>>>>> overhead)
>>>>>>> Yes,  IMHO the best is to run in  "full screen mode".
>>>>>>
>>>>>> Yes, we are working on mechanisms to present directly to the headset
>>>>>> display without any intermediaries as a separate effort.
>>>>>>
>>>>>>>
>>>>>>>>  The latency is our main concern,
>>>>>>> I would assume that this is the known problem (at least for compute
>>>>>>> usage).
>>>>>>> It looks like that amdgpu / kernel submission is rather CPU 
>>>>>>> intensive
>>>>>>> (at least
>>>>>>> in the default configuration).
>>>>>>
>>>>>> As long as it's a consistent cost, it shouldn't an issue. 
>>>>>> However, if
>>>>>> there's high degrees of variance then that would be troublesome 
>>>>>> and we
>>>>>> would need to account for the worst case.
>>>>>>
>>>>>> Hopefully the requirements and approach we described make sense, 
>>>>>> we're
>>>>>> looking forward to your feedback and suggestions.
>>>>>>
>>>>>> Thanks!
>>>>>>  - Pierre-Loup
>>>>>>
>>>>>>>
>>>>>>> Sincerely yours,
>>>>>>> Serguei Sagalovitch
>>>>>>>
>>>>>>>
>>>>>>> From: Andres Rodriguez <andresr at valvesoftware.com>
>>>>>>> Sent: December 16, 2016 10:00 PM
>>>>>>> To: Sagalovitch, Serguei; amd-gfx at lists.freedesktop.org
>>>>>>> Subject: RE: [RFC] Mechanism for high priority scheduling in amdgpu
>>>>>>>
>>>>>>> Hey Serguei,
>>>>>>>
>>>>>>>> [Serguei] No. I mean pipe :-) as MEC define it.  As far as I
>>>>>>>> understand (by simplifying)
>>>>>>>> some scheduling is per pipe.  I know about the current allocation
>>>>>>>> scheme but I do not think
>>>>>>>> that it is  ideal.  I would assume that we need  to switch to
>>>>>>>> dynamical partition
>>>>>>>> of resources  based on the workload otherwise we will have 
>>>>>>>> resource
>>>>>>>> conflict
>>>>>>>> between Vulkan compute and  OpenCL.
>>>>>>>
>>>>>>> I agree the partitioning isn't ideal. I'm hoping we can start 
>>>>>>> with a
>>>>>>> solution that assumes that
>>>>>>> only pipe0 has any work and the other pipes are idle (no HSA/ROCm
>>>>>>> running on the system).
>>>>>>>
>>>>>>> This should be more or less the use case we expect from VR users.
>>>>>>>
>>>>>>> I agree the split is currently not ideal, but I'd like to consider
>>>>>>> that a separate task, because
>>>>>>> making it dynamic is not straight forward :P
>>>>>>>
>>>>>>>> [Serguei] Vulkan works via amdgpu (kernel submissions) so amdkfd
>>>>>>>> will be not
>>>>>>>> involved.  I would assume that in the case of VR we will have 
>>>>>>>> one main
>>>>>>>> application ("console" mode(?)) so we could temporally "ignore"
>>>>>>>> OpenCL/ROCm needs when VR is running.
>>>>>>>
>>>>>>> Correct, this is why we want to enable the high priority compute
>>>>>>> queue through
>>>>>>> libdrm-amdgpu, so that we can expose it through Vulkan later.
>>>>>>>
>>>>>>> For current VR workloads we have 3 separate processes running 
>>>>>>> actually:
>>>>>>>     1) Game process
>>>>>>>     2) VR Compositor (this is the process that will require high
>>>>>>> priority queue)
>>>>>>>     3) System compositor (we are looking at approaches to remove 
>>>>>>> this
>>>>>>> overhead)
>>>>>>>
>>>>>>> For now I think it is okay to assume no OpenCL/ROCm running
>>>>>>> simultaneously, but
>>>>>>> I would also like to be able to address this case in the future
>>>>>>> (cross-pipe priorities).
>>>>>>>
>>>>>>>> [Serguei]  The problem with pre-emption of graphics task:  (a) it
>>>>>>>> may take time so
>>>>>>>> latency may suffer
>>>>>>>
>>>>>>> The latency is our main concern, we want something that is
>>>>>>> predictable. A good
>>>>>>> illustration of what the reprojection scheduling looks like can be
>>>>>>> found here:
>>>>>>> https://community.amd.com/servlet/JiveServlet/showImage/38-1310-104754/pastedImage_3.png 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> (b) to preempt we need to have different "context" - we want
>>>>>>>> to guarantee that submissions from the same context will be 
>>>>>>>> executed
>>>>>>>> in order.
>>>>>>>
>>>>>>> This is okay, as the reprojection work doesn't have dependencies on
>>>>>>> the game context, and it
>>>>>>> even happens in a separate process.
>>>>>>>
>>>>>>>> BTW: (a) Do you want "preempt" and later resume or do you want
>>>>>>>> "preempt" and
>>>>>>>> "cancel/abort"
>>>>>>>
>>>>>>> Preempt the game with the compositor task and then resume it.
>>>>>>>
>>>>>>>> (b) Vulkan is generic API and could be used for graphics as 
>>>>>>>> well as
>>>>>>>> for plain compute tasks (VK_QUEUE_COMPUTE_BIT).
>>>>>>>
>>>>>>> Yeah, the plan is to use vulkan compute. But if you figure out a 
>>>>>>> way
>>>>>>> for us to get
>>>>>>> a guaranteed execution time using vulkan graphics, then I'll 
>>>>>>> take you
>>>>>>> out for a beer :)
>>>>>>>
>>>>>>> Regards,
>>>>>>> Andres
>>>>>>> ________________________________________
>>>>>>> From: Sagalovitch, Serguei [Serguei.Sagalovitch at amd.com]
>>>>>>> Sent: Friday, December 16, 2016 9:13 PM
>>>>>>> To: Andres Rodriguez; amd-gfx at lists.freedesktop.org
>>>>>>> Subject: Re: [RFC] Mechanism for high priority scheduling in amdgpu
>>>>>>>
>>>>>>> Hi Andres,
>>>>>>>
>>>>>>> Please see inline (as [Serguei])
>>>>>>>
>>>>>>> Sincerely yours,
>>>>>>> Serguei Sagalovitch
>>>>>>>
>>>>>>>
>>>>>>> From: Andres Rodriguez <andresr at valvesoftware.com>
>>>>>>> Sent: December 16, 2016 8:29 PM
>>>>>>> To: Sagalovitch, Serguei; amd-gfx at lists.freedesktop.org
>>>>>>> Subject: RE: [RFC] Mechanism for high priority scheduling in amdgpu
>>>>>>>
>>>>>>> Hi Serguei,
>>>>>>>
>>>>>>> Thanks for the feedback. Answers inline as [AR].
>>>>>>>
>>>>>>> Regards,
>>>>>>> Andres
>>>>>>>
>>>>>>> ________________________________________
>>>>>>> From: Sagalovitch, Serguei [Serguei.Sagalovitch at amd.com]
>>>>>>> Sent: Friday, December 16, 2016 8:15 PM
>>>>>>> To: Andres Rodriguez; amd-gfx at lists.freedesktop.org
>>>>>>> Subject: Re: [RFC] Mechanism for high priority scheduling in amdgpu
>>>>>>>
>>>>>>> Andres,
>>>>>>>
>>>>>>>
>>>>>>> Quick comments:
>>>>>>>
>>>>>>> 1) To minimize "bubbles", etc. we need to "force" CU 
>>>>>>> assignments/binding
>>>>>>> to high-priority queue  when it will be in use and "free" them 
>>>>>>> later
>>>>>>> (we  do not want forever take CUs from e.g. graphic task to degrade
>>>>>>> graphics
>>>>>>> performance).
>>>>>>>
>>>>>>> Otherwise we could have scenario when long graphics task (or
>>>>>>> low-priority
>>>>>>> compute) will took all (extra) CUs and high--priority will wait for
>>>>>>> needed resources.
>>>>>>> It will not be visible on "NOP " but only when you submit "real"
>>>>>>> compute task
>>>>>>> so I would recommend  not to use "NOP" packets at all for testing.
>>>>>>>
>>>>>>> It (CU assignment) could be relatively easy done when everything is
>>>>>>> going via kernel
>>>>>>> (e.g. as part of frame submission) but I must admit that I am 
>>>>>>> not sure
>>>>>>> about the best way for user level submissions (amdkfd).
>>>>>>>
>>>>>>> [AR] I wasn't aware of this part of the programming sequence. 
>>>>>>> Thanks
>>>>>>> for the heads up!
>>>>>>> Is this similar to the CU masking programming?
>>>>>>> [Serguei] Yes. To simplify: the problem is that "scheduler" when
>>>>>>> deciding which
>>>>>>> queue to  run will check if there is enough resources and if not 
>>>>>>> then
>>>>>>> it will begin
>>>>>>> to check other queues with lower priority.
>>>>>>>
>>>>>>> 2) I would recommend to dedicate the whole pipe to high-priority
>>>>>>> queue and have
>>>>>>> nothing their except it.
>>>>>>>
>>>>>>> [AR] I'm guessing in this context you mean pipe = queue? (as 
>>>>>>> opposed
>>>>>>> to the MEC definition
>>>>>>> of pipe, which is a grouping of queues). I say this because amdgpu
>>>>>>> only has access to 1 pipe,
>>>>>>> and the rest are statically partitioned for amdkfd usage.
>>>>>>>
>>>>>>> [Serguei] No. I mean pipe :-)  as MEC define it.  As far as I
>>>>>>> understand (by simplifying)
>>>>>>> some scheduling is per pipe.  I know about the current allocation
>>>>>>> scheme but I do not think
>>>>>>> that it is  ideal.  I would assume that we need  to switch to
>>>>>>> dynamical partition
>>>>>>> of resources  based on the workload otherwise we will have resource
>>>>>>> conflict
>>>>>>> between Vulkan compute and  OpenCL.
>>>>>>>
>>>>>>>
>>>>>>> BTW: Which user level API do you want to use for compute: Vulkan or
>>>>>>> OpenCL?
>>>>>>>
>>>>>>> [AR] Vulkan
>>>>>>>
>>>>>>> [Serguei] Vulkan works via amdgpu (kernel submissions) so amdkfd 
>>>>>>> will
>>>>>>> be not
>>>>>>> involved.  I would assume that in the case of VR we will have 
>>>>>>> one main
>>>>>>> application ("console" mode(?)) so we could temporally "ignore"
>>>>>>> OpenCL/ROCm needs when VR is running.
>>>>>>>
>>>>>>>>  we will not be able to provide a solution compatible with GFX
>>>>>>>> worloads.
>>>>>>> I assume that you are talking about graphics? Am I right?
>>>>>>>
>>>>>>> [AR] Yeah, my understanding is that pre-empting the currently 
>>>>>>> running
>>>>>>> graphics job and scheduling in
>>>>>>> something else using mid-buffer pre-emption has some cases where it
>>>>>>> doesn't work well. But if with
>>>>>>> polaris10 it starts working well, it might be a better solution for
>>>>>>> us (because the whole reprojection
>>>>>>> work uses the vulkan graphics stack at the moment, and porting 
>>>>>>> it to
>>>>>>> compute is not trivial).
>>>>>>>
>>>>>>> [Serguei]  The problem with pre-emption of graphics task: (a) it 
>>>>>>> may
>>>>>>> take time so
>>>>>>> latency may suffer (b) to preempt we need to have different 
>>>>>>> "context"
>>>>>>> - we want
>>>>>>> to guarantee that submissions from the same context will be 
>>>>>>> executed
>>>>>>> in order.
>>>>>>> BTW: (a) Do you want  "preempt" and later resume or do you want
>>>>>>> "preempt" and
>>>>>>> "cancel/abort"?  (b) Vulkan is generic API and could be used
>>>>>>> for graphics as well as for plain compute tasks 
>>>>>>> (VK_QUEUE_COMPUTE_BIT).
>>>>>>>
>>>>>>>
>>>>>>> Sincerely yours,
>>>>>>> Serguei Sagalovitch
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> on behalf of
>>>>>>> Andres Rodriguez <andresr at valvesoftware.com>
>>>>>>> Sent: December 16, 2016 6:15 PM
>>>>>>> To: amd-gfx at lists.freedesktop.org
>>>>>>> Subject: [RFC] Mechanism for high priority scheduling in amdgpu
>>>>>>>
>>>>>>> Hi Everyone,
>>>>>>>
>>>>>>> This RFC is also available as a gist here:
>>>>>>> https://gist.github.com/lostgoat/7000432cd6864265dbc2c3ab93204249
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> [RFC] Mechanism for high priority scheduling in amdgpu
>>>>>>> gist.github.com
>>>>>>> [RFC] Mechanism for high priority scheduling in amdgpu
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> [RFC] Mechanism for high priority scheduling in amdgpu
>>>>>>> gist.github.com
>>>>>>> [RFC] Mechanism for high priority scheduling in amdgpu
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> [RFC] Mechanism for high priority scheduling in amdgpu
>>>>>>> gist.github.com
>>>>>>> [RFC] Mechanism for high priority scheduling in amdgpu
>>>>>>>
>>>>>>>
>>>>>>> We are interested in feedback for a mechanism to effectively 
>>>>>>> schedule
>>>>>>> high
>>>>>>> priority VR reprojection tasks (also referred to as 
>>>>>>> time-warping) for
>>>>>>> Polaris10
>>>>>>> running on the amdgpu kernel driver.
>>>>>>>
>>>>>>> Brief context:
>>>>>>> --------------
>>>>>>>
>>>>>>> The main objective of reprojection is to avoid motion sickness 
>>>>>>> for VR
>>>>>>> users in
>>>>>>> scenarios where the game or application would fail to finish
>>>>>>> rendering a new
>>>>>>> frame in time for the next VBLANK. When this happens, the user's 
>>>>>>> head
>>>>>>> movements
>>>>>>> are not reflected on the Head Mounted Display (HMD) for the 
>>>>>>> duration
>>>>>>> of an
>>>>>>> extra frame. This extended mismatch between the inner ear and the
>>>>>>> eyes may
>>>>>>> cause the user to experience motion sickness.
>>>>>>>
>>>>>>> The VR compositor deals with this problem by fabricating a new 
>>>>>>> frame
>>>>>>> using the
>>>>>>> user's updated head position in combination with the previous 
>>>>>>> frames.
>>>>>>> This
>>>>>>> avoids a prolonged mismatch between the HMD output and the inner 
>>>>>>> ear.
>>>>>>>
>>>>>>> Because of the adverse effects on the user, we require high
>>>>>>> confidence that the
>>>>>>> reprojection task will complete before the VBLANK interval. Even if
>>>>>>> the GFX pipe
>>>>>>> is currently full of work from the game/application (which is most
>>>>>>> likely the case).
>>>>>>>
>>>>>>> For more details and illustrations, please refer to the following
>>>>>>> document:
>>>>>>> https://community.amd.com/community/gaming/blog/2016/03/28/asynchronous-shaders-evolved 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Gaming: Asynchronous Shaders Evolved | Community
>>>>>>> community.amd.com
>>>>>>> One of the most exciting new developments in GPU technology over 
>>>>>>> the
>>>>>>> past year has been the adoption of asynchronous shaders, which can
>>>>>>> make more efficient use of ...
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Gaming: Asynchronous Shaders Evolved | Community
>>>>>>> community.amd.com
>>>>>>> One of the most exciting new developments in GPU technology over 
>>>>>>> the
>>>>>>> past year has been the adoption of asynchronous shaders, which can
>>>>>>> make more efficient use of ...
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Gaming: Asynchronous Shaders Evolved | Community
>>>>>>> community.amd.com
>>>>>>> One of the most exciting new developments in GPU technology over 
>>>>>>> the
>>>>>>> past year has been the adoption of asynchronous shaders, which can
>>>>>>> make more efficient use of ...
>>>>>>>
>>>>>>>
>>>>>>> Requirements:
>>>>>>> -------------
>>>>>>>
>>>>>>> The mechanism must expose the following functionaility:
>>>>>>>
>>>>>>>     * Job round trip time must be predictable, from submission to
>>>>>>> fence signal
>>>>>>>
>>>>>>>     * The mechanism must support compute workloads.
>>>>>>>
>>>>>>> Goals:
>>>>>>> ------
>>>>>>>
>>>>>>>     * The mechanism should provide low submission latencies
>>>>>>>
>>>>>>> Test: submitting a NOP packet through the mechanism on busy 
>>>>>>> hardware
>>>>>>> should
>>>>>>> be equivalent to submitting a NOP on idle hardware.
>>>>>>>
>>>>>>> Nice to have:
>>>>>>> -------------
>>>>>>>
>>>>>>>     * The mechanism should also support GFX workloads.
>>>>>>>
>>>>>>> My understanding is that with the current hardware capabilities in
>>>>>>> Polaris10 we
>>>>>>> will not be able to provide a solution compatible with GFX 
>>>>>>> worloads.
>>>>>>>
>>>>>>> But I would love to hear otherwise. So if anyone has an idea,
>>>>>>> approach or
>>>>>>> suggestion that will also be compatible with the GFX ring, 
>>>>>>> please let
>>>>>>> us know
>>>>>>> about it.
>>>>>>>
>>>>>>>     * The above guarantees should also be respected by amdkfd 
>>>>>>> workloads
>>>>>>>
>>>>>>> Would be good to have for consistency, but not strictly 
>>>>>>> necessary as
>>>>>>> users running
>>>>>>> games are not traditionally running HPC workloads in the 
>>>>>>> background.
>>>>>>>
>>>>>>> Proposed approach:
>>>>>>> ------------------
>>>>>>>
>>>>>>> Similar to the windows driver, we could expose a high priority
>>>>>>> compute queue to
>>>>>>> userspace.
>>>>>>>
>>>>>>> Submissions to this compute queue will be scheduled with high
>>>>>>> priority, and may
>>>>>>> acquire hardware resources previously in use by other queues.
>>>>>>>
>>>>>>> This can be achieved by taking advantage of the 'priority' field in
>>>>>>> the HQDs
>>>>>>> and could be programmed by amdgpu or the amdgpu scheduler. The 
>>>>>>> relevant
>>>>>>> register fields are:
>>>>>>>         * mmCP_HQD_PIPE_PRIORITY
>>>>>>>         * mmCP_HQD_QUEUE_PRIORITY
>>>>>>>
>>>>>>> Implementation approach 1 - static partitioning:
>>>>>>> ------------------------------------------------
>>>>>>>
>>>>>>> The amdgpu driver currently controls 8 compute queues from 
>>>>>>> pipe0. We can
>>>>>>> statically partition these as follows:
>>>>>>>         * 7x regular
>>>>>>>         * 1x high priority
>>>>>>>
>>>>>>> The relevant priorities can be set so that submissions to the high
>>>>>>> priority
>>>>>>> ring will starve the other compute rings and the GFX ring.
>>>>>>>
>>>>>>> The amdgpu scheduler will only place jobs into the high priority
>>>>>>> rings if the
>>>>>>> context is marked as high priority. And a corresponding priority
>>>>>>> should be
>>>>>>> added to keep track of this information:
>>>>>>>      * AMD_SCHED_PRIORITY_KERNEL
>>>>>>>      * -> AMD_SCHED_PRIORITY_HIGH
>>>>>>>      * AMD_SCHED_PRIORITY_NORMAL
>>>>>>>
>>>>>>> The user will request a high priority context by setting an
>>>>>>> appropriate flag
>>>>>>> in drm_amdgpu_ctx_in (AMDGPU_CTX_HIGH_PRIORITY or similar):
>>>>>>> https://github.com/torvalds/linux/blob/master/include/uapi/drm/amdgpu_drm.h#L163 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The setting is in a per context level so that we can:
>>>>>>>     * Maintain a consistent FIFO ordering of all submissions to a
>>>>>>> context
>>>>>>>     * Create high priority and non-high priority contexts in the 
>>>>>>> same
>>>>>>> process
>>>>>>>
>>>>>>> Implementation approach 2 - dynamic priority programming:
>>>>>>> ---------------------------------------------------------
>>>>>>>
>>>>>>> Similar to the above, but instead of programming the priorities and
>>>>>>> amdgpu_init() time, the SW scheduler will reprogram the queue 
>>>>>>> priorities
>>>>>>> dynamically when scheduling a task.
>>>>>>>
>>>>>>> This would involve having a hardware specific callback from the
>>>>>>> scheduler to
>>>>>>> set the appropriate queue priority: set_priority(int ring, int 
>>>>>>> index,
>>>>>>> int priority)
>>>>>>>
>>>>>>> During this callback we would have to grab the SRBM mutex to 
>>>>>>> perform
>>>>>>> the appropriate
>>>>>>> HW programming, and I'm not really sure if that is something we
>>>>>>> should be doing from
>>>>>>> the scheduler.
>>>>>>>
>>>>>>> On the positive side, this approach would allow us to program a 
>>>>>>> range of
>>>>>>> priorities for jobs instead of a single "high priority" value",
>>>>>>> achieving
>>>>>>> something similar to the niceness API available for CPU scheduling.
>>>>>>>
>>>>>>> I'm not sure if this flexibility is something that we would need 
>>>>>>> for
>>>>>>> our use
>>>>>>> case, but it might be useful in other scenarios (multiple users
>>>>>>> sharing compute
>>>>>>> time on a server).
>>>>>>>
>>>>>>> This approach would require a new int field in 
>>>>>>> drm_amdgpu_ctx_in, or
>>>>>>> repurposing
>>>>>>> of the flags field.
>>>>>>>
>>>>>>> Known current obstacles:
>>>>>>> ------------------------
>>>>>>>
>>>>>>> The SQ is currently programmed to disregard the HQD priorities, and
>>>>>>> instead it picks
>>>>>>> jobs at random. Settings from the shader itself are also 
>>>>>>> disregarded
>>>>>>> as this is
>>>>>>> considered a privileged field.
>>>>>>>
>>>>>>> Effectively we can get our compute wavefront launched ASAP, but we
>>>>>>> might not get the
>>>>>>> time we need on the SQ.
>>>>>>>
>>>>>>> The current programming would have to be changed to allow priority
>>>>>>> propagation
>>>>>>> from the HQD into the SQ.
>>>>>>>
>>>>>>> Generic approach for all HW IPs:
>>>>>>> --------------------------------
>>>>>>>
>>>>>>> For consistency purposes, the high priority context can be enabled
>>>>>>> for all HW IPs
>>>>>>> with support of the SW scheduler. This will function similarly 
>>>>>>> to the
>>>>>>> current
>>>>>>> AMD_SCHED_PRIORITY_KERNEL priority, where the job can jump ahead of
>>>>>>> anything not
>>>>>>> commited to the HW queue.
>>>>>>>
>>>>>>> The benefits of requesting a high priority context for a 
>>>>>>> non-compute
>>>>>>> queue will
>>>>>>> be lesser (e.g. up to 10s of wait time if a GFX command is stuck in
>>>>>>> front of
>>>>>>> you), but having the API in place will allow us to easily 
>>>>>>> improve the
>>>>>>> implementation
>>>>>>> in the future as new features become available in new hardware.
>>>>>>>
>>>>>>> Future steps:
>>>>>>> -------------
>>>>>>>
>>>>>>> Once we have an approach settled, I can take care of the 
>>>>>>> implementation.
>>>>>>>
>>>>>>> Also, once the interface is mostly decided, we can start 
>>>>>>> thinking about
>>>>>>> exposing the high priority queue through radv.
>>>>>>>
>>>>>>> Request for feedback:
>>>>>>> ---------------------
>>>>>>>
>>>>>>> We aren't married to any of the approaches outlined above. Our goal
>>>>>>> is to
>>>>>>> obtain a mechanism that will allow us to complete the reprojection
>>>>>>> job within a
>>>>>>> predictable amount of time. So if anyone anyone has any 
>>>>>>> suggestions for
>>>>>>> improvements or alternative strategies we are more than happy to 
>>>>>>> hear
>>>>>>> them.
>>>>>>>
>>>>>>> If any of the technical information above is also incorrect, feel
>>>>>>> free to point
>>>>>>> out my misunderstandings.
>>>>>>>
>>>>>>> Looking forward to hearing from you.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Andres
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> amd-gfx mailing list
>>>>>>> amd-gfx at lists.freedesktop.org
>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>>>
>>>>>>>
>>>>>>> amd-gfx Info Page - lists.freedesktop.org
>>>>>>> lists.freedesktop.org
>>>>>>> To see the collection of prior postings to the list, visit the
>>>>>>> amd-gfx Archives. Using amd-gfx: To post a message to all the list
>>>>>>> members, send email ...
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> amd-gfx Info Page - lists.freedesktop.org
>>>>>>> lists.freedesktop.org
>>>>>>> To see the collection of prior postings to the list, visit the
>>>>>>> amd-gfx Archives. Using amd-gfx: To post a message to all the list
>>>>>>> members, send email ...
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> amd-gfx mailing list
>>>>>>> amd-gfx at lists.freedesktop.org
>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx at lists.freedesktop.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>

Sincerely yours,
Serguei Sagalovitch