Hey Christian, But yes, in general you don't want another compositor in the way, so we'll >> be acquiring the HMD display directly, separate from any desktop or display >> server. > > > Assuming that the the HMD is attached to the rendering device in some way > you have the X server and the Compositor which both try to be DRM master at > the same time. > > Please correct me if that was fixed in the meantime, but that sounds like > it will simply not work. Or is this what Andres mention below Dave is > working on ?. > You are correct on both statements. We can't have two DRM_MASTERs, so the current DRM+X does not support this use case. And this what Dave and Pierre-Loup are currently working on. Additional to that a compositor in combination with X is a bit counter > productive when you want to keep the latency low. > One thing I'd like to correct is that our main goal is to get latency _predictable_, secondary goal is to make it low. The high priority queue feature addresses our main source of unpredictability: the scheduling latency when the hardware is already full of work from the game engine. The DirectMode feature addresses one of the latency sources: multiple (unnecessary) context switches to submit a surface to the DRM driver. Targeting something like Wayland and when you need X compatibility XWayland > sounds like the much better idea. > We are pretty enthusiastic about Wayland (and really glad to see Fedora 25 use Wayland by default). Once we have everything working nicely under X (where most of the users are currently), I'm sure Pierre-Loup will be pushing us to get everything optimized under Wayland as well (which should be a lot simpler!). Ever since working with SurfaceFlinger on Android with explicit fencing I've been waiting for the day I can finally ditch X altogether :) Regards, Andres On Fri, Dec 23, 2016 at 5:54 AM, Christian König <christian.koenig at amd.com> wrote: > But yes, in general you don't want another compositor in the way, so we'll >> be acquiring the HMD display directly, separate from any desktop or display >> server. >> > Assuming that the the HMD is attached to the rendering device in some way > you have the X server and the Compositor which both try to be DRM master at > the same time. > > Please correct me if that was fixed in the meantime, but that sounds like > it will simply not work. Or is this what Andres mention below Dave is > working on ?. > > Additional to that a compositor in combination with X is a bit counter > productive when you want to keep the latency low. > > E.g. the "normal" flow of a GL or Vulkan surface filled with rendered data > to be displayed is from the Application -> X server -> compositor -> X > server. > > The extra step between X server and compositor just means extra latency > and for this use case you probably don't want that. > > Targeting something like Wayland and when you need X compatibility > XWayland sounds like the much better idea. > > Regards, > Christian. > > > Am 22.12.2016 um 20:54 schrieb Pierre-Loup A. Griffais: > >> Display concerns are a separate issue, and as Andres said we have other >> plans to address. But yes, in general you don't want another compositor in >> the way, so we'll be acquiring the HMD display directly, separate from any >> desktop or display server. Same with security, we can have a separate >> conversation about that when the time comes. >> >> On 12/22/2016 08:41 AM, Serguei Sagalovitch wrote: >> >>> Andres, >>> >>> Did you measure latency, etc. impact of __any__ compositor? >>> >>> My understanding is that VR has pretty strict requirements related to >>> QoS. >>> >>> Sincerely yours, >>> Serguei Sagalovitch >>> >>> >>> On 2016-12-22 11:35 AM, Andres Rodriguez wrote: >>> >>>> Hey Christian, >>>> >>>> We are currently interested in X, but with some distros switching to >>>> other compositors by default, we also need to consider those. >>>> >>>> We agree, running the full vrcompositor in root isn't something that >>>> we want to do. Too many security concerns. Having a small root helper >>>> that does the privilege escalation for us is the initial idea. >>>> >>>> For a long term approach, Pierre-Loup and Dave are working on dealing >>>> with the "two compositors" scenario a little better in DRM+X. >>>> Fullscreen isn't really a sufficient approach, since we don't want the >>>> HMD to be used as part of the Desktop environment when a VR app is not >>>> in use (this is extremely annoying). >>>> >>>> When the above is settled, we should have an auth mechanism besides >>>> DRM_MASTER or DRM_AUTH that allows the vrcompositor to take over the >>>> HMD permanently away from X. Re-using that auth method to gate this >>>> IOCTL is probably going to be the final solution. >>>> >>>> I propose to start with ROOT_ONLY since it should allow us to respect >>>> kernel IOCTL compatibility guidelines with the most flexibility. Going >>>> from a restrictive to a more flexible permission model would be >>>> inclusive, but going from a general to a restrictive model may exclude >>>> some apps that used to work. >>>> >>>> Regards, >>>> Andres >>>> >>>> On 12/22/2016 6:42 AM, Christian König wrote: >>>> >>>>> Hi Andres, >>>>> >>>>> well using root might cause stability and security problems as well. >>>>> We worked quite hard to avoid exactly this for X. >>>>> >>>>> We could make this feature depend on the compositor being DRM master, >>>>> but for example with X the X server is master (and e.g. can change >>>>> resolutions etc..) and not the compositor. >>>>> >>>>> So another question is also what windowing system (if any) are you >>>>> planning to use? X, Wayland, Flinger or something completely different >>>>> ? >>>>> >>>>> Regards, >>>>> Christian. >>>>> >>>>> Am 20.12.2016 um 16:51 schrieb Andres Rodriguez: >>>>> >>>>>> Hi Christian, >>>>>> >>>>>> That is definitely a concern. What we are currently thinking is to >>>>>> make the high priority queues accessible to root only. >>>>>> >>>>>> Therefore is a non-root user attempts to set the high priority flag >>>>>> on context allocation, we would fail the call and return ENOPERM. >>>>>> >>>>>> Regards, >>>>>> Andres >>>>>> >>>>>> >>>>>> On 12/20/2016 7:56 AM, Christian König wrote: >>>>>> >>>>>>> BTW: If there is non-VR application which will use high-priority >>>>>>>> h/w queue then VR application will suffer. Any ideas how >>>>>>>> to solve it? >>>>>>>> >>>>>>> Yeah, that problem came to my mind as well. >>>>>>> >>>>>>> Basically we need to restrict those high priority submissions to >>>>>>> the VR compositor or otherwise any malfunctioning application could >>>>>>> use it. >>>>>>> >>>>>>> Just think about some WebGL suddenly taking all our rendering away >>>>>>> and we won't get anything drawn any more. >>>>>>> >>>>>>> Alex or Michel any ideas on that? >>>>>>> >>>>>>> Regards, >>>>>>> Christian. >>>>>>> >>>>>>> Am 19.12.2016 um 15:48 schrieb Serguei Sagalovitch: >>>>>>> >>>>>>>> > If compute queue is occupied only by you, the efficiency >>>>>>>> > is equal with setting job queue to high priority I think. >>>>>>>> The only risk is the situation when graphics will take all >>>>>>>> needed CUs. But in any case it should be very good test. >>>>>>>> >>>>>>>> Andres/Pierre-Loup, >>>>>>>> >>>>>>>> Did you try to do it or it is a lot of work for you? >>>>>>>> >>>>>>>> >>>>>>>> BTW: If there is non-VR application which will use high-priority >>>>>>>> h/w queue then VR application will suffer. Any ideas how >>>>>>>> to solve it? >>>>>>>> >>>>>>>> Sincerely yours, >>>>>>>> Serguei Sagalovitch >>>>>>>> >>>>>>>> On 2016-12-19 12:50 AM, zhoucm1 wrote: >>>>>>>> >>>>>>>>> Do you encounter the priority issue for compute queue with >>>>>>>>> current driver? >>>>>>>>> >>>>>>>>> If compute queue is occupied only by you, the efficiency is equal >>>>>>>>> with setting job queue to high priority I think. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> David Zhou >>>>>>>>> >>>>>>>>> On 2016å¹´12æ??19æ?¥ 13:29, Andres Rodriguez wrote: >>>>>>>>> >>>>>>>>>> Yes, vulkan is available on all-open through the mesa radv UMD. >>>>>>>>>> >>>>>>>>>> I'm not sure if I'm asking for too much, but if we can >>>>>>>>>> coordinate a similar interface in radv and amdgpu-pro at the >>>>>>>>>> vulkan level that would be great. >>>>>>>>>> >>>>>>>>>> I'm not sure what that's going to be yet. >>>>>>>>>> >>>>>>>>>> - Andres >>>>>>>>>> >>>>>>>>>> On 12/19/2016 12:11 AM, zhoucm1 wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 2016å¹´12æ??19æ?¥ 11:33, Pierre-Loup A. Griffais wrote: >>>>>>>>>>> >>>>>>>>>>>> We're currently working with the open stack; I assume that a >>>>>>>>>>>> mechanism could be exposed by both open and Pro Vulkan >>>>>>>>>>>> userspace drivers and that the amdgpu kernel interface >>>>>>>>>>>> improvements we would pursue following this discussion would >>>>>>>>>>>> let both drivers take advantage of the feature, correct? >>>>>>>>>>>> >>>>>>>>>>> Of course. >>>>>>>>>>> Does open stack have Vulkan support? >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> David Zhou >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 12/18/2016 07:26 PM, zhoucm1 wrote: >>>>>>>>>>>> >>>>>>>>>>>>> By the way, are you using all-open driver or amdgpu-pro driver? >>>>>>>>>>>>> >>>>>>>>>>>>> +David Mao, who is working on our Vulkan driver. >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> David Zhou >>>>>>>>>>>>> >>>>>>>>>>>>> On 2016å¹´12æ??18æ?¥ 06:05, Pierre-Loup A. Griffais wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Serguei, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm also working on the bringing up our VR runtime on top of >>>>>>>>>>>>>> amgpu; >>>>>>>>>>>>>> see replies inline. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 12/16/2016 09:05 PM, Sagalovitch, Serguei wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Andres, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> For current VR workloads we have 3 separate processes >>>>>>>>>>>>>>>> running >>>>>>>>>>>>>>>> actually: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So we could have potential memory overcommit case or do you >>>>>>>>>>>>>>> do >>>>>>>>>>>>>>> partitioning >>>>>>>>>>>>>>> on your own? I would think that there is need to avoid >>>>>>>>>>>>>>> overcomit in >>>>>>>>>>>>>>> VR case to >>>>>>>>>>>>>>> prevent any BO migration. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> You're entirely correct; currently the VR runtime is setting >>>>>>>>>>>>>> up >>>>>>>>>>>>>> prioritized CPU scheduling for its VR compositor, we're >>>>>>>>>>>>>> working on >>>>>>>>>>>>>> prioritized GPU scheduling and pre-emption (eg. this >>>>>>>>>>>>>> thread), and in >>>>>>>>>>>>>> the future it will make sense to do work in order to make >>>>>>>>>>>>>> sure that >>>>>>>>>>>>>> its memory allocations do not get evicted, to prevent any >>>>>>>>>>>>>> unwelcome >>>>>>>>>>>>>> additional latency in the event of needing to perform >>>>>>>>>>>>>> just-in-time >>>>>>>>>>>>>> reprojection. >>>>>>>>>>>>>> >>>>>>>>>>>>>> BTW: Do you mean __real__ processes or threads? >>>>>>>>>>>>>>> Based on my understanding sharing BOs between different >>>>>>>>>>>>>>> processes >>>>>>>>>>>>>>> could introduce additional synchronization constrains. btw: >>>>>>>>>>>>>>> I am not >>>>>>>>>>>>>>> sure >>>>>>>>>>>>>>> if we are able to share Vulkan sync. object cross-process >>>>>>>>>>>>>>> boundary. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> They are different processes; it is important for the >>>>>>>>>>>>>> compositor that >>>>>>>>>>>>>> is responsible for quality-of-service features such as >>>>>>>>>>>>>> consistently >>>>>>>>>>>>>> presenting distorted frames with the right latency, >>>>>>>>>>>>>> reprojection, etc, >>>>>>>>>>>>>> to be separate from the main application. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Currently we are using unreleased cross-process memory and >>>>>>>>>>>>>> semaphore >>>>>>>>>>>>>> extensions to fetch updated eye images from the client >>>>>>>>>>>>>> application, >>>>>>>>>>>>>> but the just-in-time reprojection discussed here does not >>>>>>>>>>>>>> actually >>>>>>>>>>>>>> have any direct interactions with cross-process resource >>>>>>>>>>>>>> sharing, >>>>>>>>>>>>>> since it's achieved by using whatever is the latest, most >>>>>>>>>>>>>> up-to-date >>>>>>>>>>>>>> eye images that have already been sent by the client >>>>>>>>>>>>>> application, >>>>>>>>>>>>>> which are already available to use without additional >>>>>>>>>>>>>> synchronization. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> 3) System compositor (we are looking at approaches to >>>>>>>>>>>>>>>> remove this >>>>>>>>>>>>>>>> overhead) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yes, IMHO the best is to run in "full screen mode". >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yes, we are working on mechanisms to present directly to the >>>>>>>>>>>>>> headset >>>>>>>>>>>>>> display without any intermediaries as a separate effort. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> The latency is our main concern, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I would assume that this is the known problem (at least for >>>>>>>>>>>>>>> compute >>>>>>>>>>>>>>> usage). >>>>>>>>>>>>>>> It looks like that amdgpu / kernel submission is rather CPU >>>>>>>>>>>>>>> intensive >>>>>>>>>>>>>>> (at least >>>>>>>>>>>>>>> in the default configuration). >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> As long as it's a consistent cost, it shouldn't an issue. >>>>>>>>>>>>>> However, if >>>>>>>>>>>>>> there's high degrees of variance then that would be >>>>>>>>>>>>>> troublesome and we >>>>>>>>>>>>>> would need to account for the worst case. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hopefully the requirements and approach we described make >>>>>>>>>>>>>> sense, we're >>>>>>>>>>>>>> looking forward to your feedback and suggestions. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>> - Pierre-Loup >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sincerely yours, >>>>>>>>>>>>>>> Serguei Sagalovitch >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> From: Andres Rodriguez <andresr at valvesoftware.com> >>>>>>>>>>>>>>> Sent: December 16, 2016 10:00 PM >>>>>>>>>>>>>>> To: Sagalovitch, Serguei; amd-gfx at lists.freedesktop.org >>>>>>>>>>>>>>> Subject: RE: [RFC] Mechanism for high priority scheduling >>>>>>>>>>>>>>> in amdgpu >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hey Serguei, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [Serguei] No. I mean pipe :-) as MEC define it. As far as I >>>>>>>>>>>>>>>> understand (by simplifying) >>>>>>>>>>>>>>>> some scheduling is per pipe. I know about the current >>>>>>>>>>>>>>>> allocation >>>>>>>>>>>>>>>> scheme but I do not think >>>>>>>>>>>>>>>> that it is ideal. I would assume that we need to switch to >>>>>>>>>>>>>>>> dynamical partition >>>>>>>>>>>>>>>> of resources based on the workload otherwise we will have >>>>>>>>>>>>>>>> resource >>>>>>>>>>>>>>>> conflict >>>>>>>>>>>>>>>> between Vulkan compute and OpenCL. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I agree the partitioning isn't ideal. I'm hoping we can >>>>>>>>>>>>>>> start with a >>>>>>>>>>>>>>> solution that assumes that >>>>>>>>>>>>>>> only pipe0 has any work and the other pipes are idle (no >>>>>>>>>>>>>>> HSA/ROCm >>>>>>>>>>>>>>> running on the system). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This should be more or less the use case we expect from VR >>>>>>>>>>>>>>> users. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I agree the split is currently not ideal, but I'd like to >>>>>>>>>>>>>>> consider >>>>>>>>>>>>>>> that a separate task, because >>>>>>>>>>>>>>> making it dynamic is not straight forward :P >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [Serguei] Vulkan works via amdgpu (kernel submissions) so >>>>>>>>>>>>>>>> amdkfd >>>>>>>>>>>>>>>> will be not >>>>>>>>>>>>>>>> involved. I would assume that in the case of VR we will >>>>>>>>>>>>>>>> have one main >>>>>>>>>>>>>>>> application ("console" mode(?)) so we could temporally >>>>>>>>>>>>>>>> "ignore" >>>>>>>>>>>>>>>> OpenCL/ROCm needs when VR is running. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Correct, this is why we want to enable the high priority >>>>>>>>>>>>>>> compute >>>>>>>>>>>>>>> queue through >>>>>>>>>>>>>>> libdrm-amdgpu, so that we can expose it through Vulkan later. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> For current VR workloads we have 3 separate processes >>>>>>>>>>>>>>> running actually: >>>>>>>>>>>>>>> 1) Game process >>>>>>>>>>>>>>> 2) VR Compositor (this is the process that will require >>>>>>>>>>>>>>> high >>>>>>>>>>>>>>> priority queue) >>>>>>>>>>>>>>> 3) System compositor (we are looking at approaches to >>>>>>>>>>>>>>> remove this >>>>>>>>>>>>>>> overhead) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> For now I think it is okay to assume no OpenCL/ROCm running >>>>>>>>>>>>>>> simultaneously, but >>>>>>>>>>>>>>> I would also like to be able to address this case in the >>>>>>>>>>>>>>> future >>>>>>>>>>>>>>> (cross-pipe priorities). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [Serguei] The problem with pre-emption of graphics task: >>>>>>>>>>>>>>>> (a) it >>>>>>>>>>>>>>>> may take time so >>>>>>>>>>>>>>>> latency may suffer >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The latency is our main concern, we want something that is >>>>>>>>>>>>>>> predictable. A good >>>>>>>>>>>>>>> illustration of what the reprojection scheduling looks like >>>>>>>>>>>>>>> can be >>>>>>>>>>>>>>> found here: >>>>>>>>>>>>>>> https://community.amd.com/servlet/JiveServlet/showImage/38- >>>>>>>>>>>>>>> 1310-104754/pastedImage_3.png >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> (b) to preempt we need to have different "context" - we want >>>>>>>>>>>>>>>> to guarantee that submissions from the same context will >>>>>>>>>>>>>>>> be executed >>>>>>>>>>>>>>>> in order. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This is okay, as the reprojection work doesn't have >>>>>>>>>>>>>>> dependencies on >>>>>>>>>>>>>>> the game context, and it >>>>>>>>>>>>>>> even happens in a separate process. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> BTW: (a) Do you want "preempt" and later resume or do you >>>>>>>>>>>>>>>> want >>>>>>>>>>>>>>>> "preempt" and >>>>>>>>>>>>>>>> "cancel/abort" >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Preempt the game with the compositor task and then resume it. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> (b) Vulkan is generic API and could be used for graphics >>>>>>>>>>>>>>>> as well as >>>>>>>>>>>>>>>> for plain compute tasks (VK_QUEUE_COMPUTE_BIT). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yeah, the plan is to use vulkan compute. But if you figure >>>>>>>>>>>>>>> out a way >>>>>>>>>>>>>>> for us to get >>>>>>>>>>>>>>> a guaranteed execution time using vulkan graphics, then >>>>>>>>>>>>>>> I'll take you >>>>>>>>>>>>>>> out for a beer :) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> Andres >>>>>>>>>>>>>>> ________________________________________ >>>>>>>>>>>>>>> From: Sagalovitch, Serguei [Serguei.Sagalovitch at amd.com] >>>>>>>>>>>>>>> Sent: Friday, December 16, 2016 9:13 PM >>>>>>>>>>>>>>> To: Andres Rodriguez; amd-gfx at lists.freedesktop.org >>>>>>>>>>>>>>> Subject: Re: [RFC] Mechanism for high priority scheduling >>>>>>>>>>>>>>> in amdgpu >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Andres, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Please see inline (as [Serguei]) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sincerely yours, >>>>>>>>>>>>>>> Serguei Sagalovitch >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> From: Andres Rodriguez <andresr at valvesoftware.com> >>>>>>>>>>>>>>> Sent: December 16, 2016 8:29 PM >>>>>>>>>>>>>>> To: Sagalovitch, Serguei; amd-gfx at lists.freedesktop.org >>>>>>>>>>>>>>> Subject: RE: [RFC] Mechanism for high priority scheduling >>>>>>>>>>>>>>> in amdgpu >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Serguei, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks for the feedback. Answers inline as [AR]. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> Andres >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ________________________________________ >>>>>>>>>>>>>>> From: Sagalovitch, Serguei [Serguei.Sagalovitch at amd.com] >>>>>>>>>>>>>>> Sent: Friday, December 16, 2016 8:15 PM >>>>>>>>>>>>>>> To: Andres Rodriguez; amd-gfx at lists.freedesktop.org >>>>>>>>>>>>>>> Subject: Re: [RFC] Mechanism for high priority scheduling >>>>>>>>>>>>>>> in amdgpu >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Andres, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Quick comments: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1) To minimize "bubbles", etc. we need to "force" CU >>>>>>>>>>>>>>> assignments/binding >>>>>>>>>>>>>>> to high-priority queue when it will be in use and "free" >>>>>>>>>>>>>>> them later >>>>>>>>>>>>>>> (we do not want forever take CUs from e.g. graphic task to >>>>>>>>>>>>>>> degrade >>>>>>>>>>>>>>> graphics >>>>>>>>>>>>>>> performance). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Otherwise we could have scenario when long graphics task (or >>>>>>>>>>>>>>> low-priority >>>>>>>>>>>>>>> compute) will took all (extra) CUs and high--priority will >>>>>>>>>>>>>>> wait for >>>>>>>>>>>>>>> needed resources. >>>>>>>>>>>>>>> It will not be visible on "NOP " but only when you submit >>>>>>>>>>>>>>> "real" >>>>>>>>>>>>>>> compute task >>>>>>>>>>>>>>> so I would recommend not to use "NOP" packets at all for >>>>>>>>>>>>>>> testing. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> It (CU assignment) could be relatively easy done when >>>>>>>>>>>>>>> everything is >>>>>>>>>>>>>>> going via kernel >>>>>>>>>>>>>>> (e.g. as part of frame submission) but I must admit that I >>>>>>>>>>>>>>> am not sure >>>>>>>>>>>>>>> about the best way for user level submissions (amdkfd). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [AR] I wasn't aware of this part of the programming >>>>>>>>>>>>>>> sequence. Thanks >>>>>>>>>>>>>>> for the heads up! >>>>>>>>>>>>>>> Is this similar to the CU masking programming? >>>>>>>>>>>>>>> [Serguei] Yes. To simplify: the problem is that "scheduler" >>>>>>>>>>>>>>> when >>>>>>>>>>>>>>> deciding which >>>>>>>>>>>>>>> queue to run will check if there is enough resources and >>>>>>>>>>>>>>> if not then >>>>>>>>>>>>>>> it will begin >>>>>>>>>>>>>>> to check other queues with lower priority. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2) I would recommend to dedicate the whole pipe to >>>>>>>>>>>>>>> high-priority >>>>>>>>>>>>>>> queue and have >>>>>>>>>>>>>>> nothing their except it. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [AR] I'm guessing in this context you mean pipe = queue? >>>>>>>>>>>>>>> (as opposed >>>>>>>>>>>>>>> to the MEC definition >>>>>>>>>>>>>>> of pipe, which is a grouping of queues). I say this because >>>>>>>>>>>>>>> amdgpu >>>>>>>>>>>>>>> only has access to 1 pipe, >>>>>>>>>>>>>>> and the rest are statically partitioned for amdkfd usage. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [Serguei] No. I mean pipe :-) as MEC define it. As far as I >>>>>>>>>>>>>>> understand (by simplifying) >>>>>>>>>>>>>>> some scheduling is per pipe. I know about the current >>>>>>>>>>>>>>> allocation >>>>>>>>>>>>>>> scheme but I do not think >>>>>>>>>>>>>>> that it is ideal. I would assume that we need to switch to >>>>>>>>>>>>>>> dynamical partition >>>>>>>>>>>>>>> of resources based on the workload otherwise we will have >>>>>>>>>>>>>>> resource >>>>>>>>>>>>>>> conflict >>>>>>>>>>>>>>> between Vulkan compute and OpenCL. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> BTW: Which user level API do you want to use for compute: >>>>>>>>>>>>>>> Vulkan or >>>>>>>>>>>>>>> OpenCL? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [AR] Vulkan >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [Serguei] Vulkan works via amdgpu (kernel submissions) so >>>>>>>>>>>>>>> amdkfd will >>>>>>>>>>>>>>> be not >>>>>>>>>>>>>>> involved. I would assume that in the case of VR we will >>>>>>>>>>>>>>> have one main >>>>>>>>>>>>>>> application ("console" mode(?)) so we could temporally >>>>>>>>>>>>>>> "ignore" >>>>>>>>>>>>>>> OpenCL/ROCm needs when VR is running. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> we will not be able to provide a solution compatible with >>>>>>>>>>>>>>>> GFX >>>>>>>>>>>>>>>> worloads. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I assume that you are talking about graphics? Am I right? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [AR] Yeah, my understanding is that pre-empting the >>>>>>>>>>>>>>> currently running >>>>>>>>>>>>>>> graphics job and scheduling in >>>>>>>>>>>>>>> something else using mid-buffer pre-emption has some cases >>>>>>>>>>>>>>> where it >>>>>>>>>>>>>>> doesn't work well. But if with >>>>>>>>>>>>>>> polaris10 it starts working well, it might be a better >>>>>>>>>>>>>>> solution for >>>>>>>>>>>>>>> us (because the whole reprojection >>>>>>>>>>>>>>> work uses the vulkan graphics stack at the moment, and >>>>>>>>>>>>>>> porting it to >>>>>>>>>>>>>>> compute is not trivial). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [Serguei] The problem with pre-emption of graphics task: >>>>>>>>>>>>>>> (a) it may >>>>>>>>>>>>>>> take time so >>>>>>>>>>>>>>> latency may suffer (b) to preempt we need to have different >>>>>>>>>>>>>>> "context" >>>>>>>>>>>>>>> - we want >>>>>>>>>>>>>>> to guarantee that submissions from the same context will be >>>>>>>>>>>>>>> executed >>>>>>>>>>>>>>> in order. >>>>>>>>>>>>>>> BTW: (a) Do you want "preempt" and later resume or do you >>>>>>>>>>>>>>> want >>>>>>>>>>>>>>> "preempt" and >>>>>>>>>>>>>>> "cancel/abort"? (b) Vulkan is generic API and could be used >>>>>>>>>>>>>>> for graphics as well as for plain compute tasks >>>>>>>>>>>>>>> (VK_QUEUE_COMPUTE_BIT). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sincerely yours, >>>>>>>>>>>>>>> Serguei Sagalovitch >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> on >>>>>>>>>>>>>>> behalf of >>>>>>>>>>>>>>> Andres Rodriguez <andresr at valvesoftware.com> >>>>>>>>>>>>>>> Sent: December 16, 2016 6:15 PM >>>>>>>>>>>>>>> To: amd-gfx at lists.freedesktop.org >>>>>>>>>>>>>>> Subject: [RFC] Mechanism for high priority scheduling in >>>>>>>>>>>>>>> amdgpu >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Everyone, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This RFC is also available as a gist here: >>>>>>>>>>>>>>> https://gist.github.com/lostgoat/7000432cd6864265dbc2c3ab932 >>>>>>>>>>>>>>> 04249 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [RFC] Mechanism for high priority scheduling in amdgpu >>>>>>>>>>>>>>> gist.github.com >>>>>>>>>>>>>>> [RFC] Mechanism for high priority scheduling in amdgpu >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [RFC] Mechanism for high priority scheduling in amdgpu >>>>>>>>>>>>>>> gist.github.com >>>>>>>>>>>>>>> [RFC] Mechanism for high priority scheduling in amdgpu >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [RFC] Mechanism for high priority scheduling in amdgpu >>>>>>>>>>>>>>> gist.github.com >>>>>>>>>>>>>>> [RFC] Mechanism for high priority scheduling in amdgpu >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We are interested in feedback for a mechanism to >>>>>>>>>>>>>>> effectively schedule >>>>>>>>>>>>>>> high >>>>>>>>>>>>>>> priority VR reprojection tasks (also referred to as >>>>>>>>>>>>>>> time-warping) for >>>>>>>>>>>>>>> Polaris10 >>>>>>>>>>>>>>> running on the amdgpu kernel driver. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Brief context: >>>>>>>>>>>>>>> -------------- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The main objective of reprojection is to avoid motion >>>>>>>>>>>>>>> sickness for VR >>>>>>>>>>>>>>> users in >>>>>>>>>>>>>>> scenarios where the game or application would fail to finish >>>>>>>>>>>>>>> rendering a new >>>>>>>>>>>>>>> frame in time for the next VBLANK. When this happens, the >>>>>>>>>>>>>>> user's head >>>>>>>>>>>>>>> movements >>>>>>>>>>>>>>> are not reflected on the Head Mounted Display (HMD) for the >>>>>>>>>>>>>>> duration >>>>>>>>>>>>>>> of an >>>>>>>>>>>>>>> extra frame. This extended mismatch between the inner ear >>>>>>>>>>>>>>> and the >>>>>>>>>>>>>>> eyes may >>>>>>>>>>>>>>> cause the user to experience motion sickness. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The VR compositor deals with this problem by fabricating a >>>>>>>>>>>>>>> new frame >>>>>>>>>>>>>>> using the >>>>>>>>>>>>>>> user's updated head position in combination with the >>>>>>>>>>>>>>> previous frames. >>>>>>>>>>>>>>> This >>>>>>>>>>>>>>> avoids a prolonged mismatch between the HMD output and the >>>>>>>>>>>>>>> inner ear. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Because of the adverse effects on the user, we require high >>>>>>>>>>>>>>> confidence that the >>>>>>>>>>>>>>> reprojection task will complete before the VBLANK interval. >>>>>>>>>>>>>>> Even if >>>>>>>>>>>>>>> the GFX pipe >>>>>>>>>>>>>>> is currently full of work from the game/application (which >>>>>>>>>>>>>>> is most >>>>>>>>>>>>>>> likely the case). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> For more details and illustrations, please refer to the >>>>>>>>>>>>>>> following >>>>>>>>>>>>>>> document: >>>>>>>>>>>>>>> https://community.amd.com/community/gaming/blog/2016/03/28/ >>>>>>>>>>>>>>> asynchronous-shaders-evolved >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Gaming: Asynchronous Shaders Evolved | Community >>>>>>>>>>>>>>> community.amd.com >>>>>>>>>>>>>>> One of the most exciting new developments in GPU technology >>>>>>>>>>>>>>> over the >>>>>>>>>>>>>>> past year has been the adoption of asynchronous shaders, >>>>>>>>>>>>>>> which can >>>>>>>>>>>>>>> make more efficient use of ... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Gaming: Asynchronous Shaders Evolved | Community >>>>>>>>>>>>>>> community.amd.com >>>>>>>>>>>>>>> One of the most exciting new developments in GPU technology >>>>>>>>>>>>>>> over the >>>>>>>>>>>>>>> past year has been the adoption of asynchronous shaders, >>>>>>>>>>>>>>> which can >>>>>>>>>>>>>>> make more efficient use of ... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Gaming: Asynchronous Shaders Evolved | Community >>>>>>>>>>>>>>> community.amd.com >>>>>>>>>>>>>>> One of the most exciting new developments in GPU technology >>>>>>>>>>>>>>> over the >>>>>>>>>>>>>>> past year has been the adoption of asynchronous shaders, >>>>>>>>>>>>>>> which can >>>>>>>>>>>>>>> make more efficient use of ... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Requirements: >>>>>>>>>>>>>>> ------------- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The mechanism must expose the following functionaility: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> * Job round trip time must be predictable, from >>>>>>>>>>>>>>> submission to >>>>>>>>>>>>>>> fence signal >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> * The mechanism must support compute workloads. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Goals: >>>>>>>>>>>>>>> ------ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> * The mechanism should provide low submission latencies >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Test: submitting a NOP packet through the mechanism on busy >>>>>>>>>>>>>>> hardware >>>>>>>>>>>>>>> should >>>>>>>>>>>>>>> be equivalent to submitting a NOP on idle hardware. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Nice to have: >>>>>>>>>>>>>>> ------------- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> * The mechanism should also support GFX workloads. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> My understanding is that with the current hardware >>>>>>>>>>>>>>> capabilities in >>>>>>>>>>>>>>> Polaris10 we >>>>>>>>>>>>>>> will not be able to provide a solution compatible with GFX >>>>>>>>>>>>>>> worloads. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> But I would love to hear otherwise. So if anyone has an idea, >>>>>>>>>>>>>>> approach or >>>>>>>>>>>>>>> suggestion that will also be compatible with the GFX ring, >>>>>>>>>>>>>>> please let >>>>>>>>>>>>>>> us know >>>>>>>>>>>>>>> about it. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> * The above guarantees should also be respected by >>>>>>>>>>>>>>> amdkfd workloads >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Would be good to have for consistency, but not strictly >>>>>>>>>>>>>>> necessary as >>>>>>>>>>>>>>> users running >>>>>>>>>>>>>>> games are not traditionally running HPC workloads in the >>>>>>>>>>>>>>> background. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Proposed approach: >>>>>>>>>>>>>>> ------------------ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Similar to the windows driver, we could expose a high >>>>>>>>>>>>>>> priority >>>>>>>>>>>>>>> compute queue to >>>>>>>>>>>>>>> userspace. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Submissions to this compute queue will be scheduled with high >>>>>>>>>>>>>>> priority, and may >>>>>>>>>>>>>>> acquire hardware resources previously in use by other queues. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This can be achieved by taking advantage of the 'priority' >>>>>>>>>>>>>>> field in >>>>>>>>>>>>>>> the HQDs >>>>>>>>>>>>>>> and could be programmed by amdgpu or the amdgpu scheduler. >>>>>>>>>>>>>>> The relevant >>>>>>>>>>>>>>> register fields are: >>>>>>>>>>>>>>> * mmCP_HQD_PIPE_PRIORITY >>>>>>>>>>>>>>> * mmCP_HQD_QUEUE_PRIORITY >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Implementation approach 1 - static partitioning: >>>>>>>>>>>>>>> ------------------------------------------------ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The amdgpu driver currently controls 8 compute queues from >>>>>>>>>>>>>>> pipe0. We can >>>>>>>>>>>>>>> statically partition these as follows: >>>>>>>>>>>>>>> * 7x regular >>>>>>>>>>>>>>> * 1x high priority >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The relevant priorities can be set so that submissions to >>>>>>>>>>>>>>> the high >>>>>>>>>>>>>>> priority >>>>>>>>>>>>>>> ring will starve the other compute rings and the GFX ring. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The amdgpu scheduler will only place jobs into the high >>>>>>>>>>>>>>> priority >>>>>>>>>>>>>>> rings if the >>>>>>>>>>>>>>> context is marked as high priority. And a corresponding >>>>>>>>>>>>>>> priority >>>>>>>>>>>>>>> should be >>>>>>>>>>>>>>> added to keep track of this information: >>>>>>>>>>>>>>> * AMD_SCHED_PRIORITY_KERNEL >>>>>>>>>>>>>>> * -> AMD_SCHED_PRIORITY_HIGH >>>>>>>>>>>>>>> * AMD_SCHED_PRIORITY_NORMAL >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The user will request a high priority context by setting an >>>>>>>>>>>>>>> appropriate flag >>>>>>>>>>>>>>> in drm_amdgpu_ctx_in (AMDGPU_CTX_HIGH_PRIORITY or similar): >>>>>>>>>>>>>>> https://github.com/torvalds/linux/blob/master/include/uapi/ >>>>>>>>>>>>>>> drm/amdgpu_drm.h#L163 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The setting is in a per context level so that we can: >>>>>>>>>>>>>>> * Maintain a consistent FIFO ordering of all >>>>>>>>>>>>>>> submissions to a >>>>>>>>>>>>>>> context >>>>>>>>>>>>>>> * Create high priority and non-high priority contexts >>>>>>>>>>>>>>> in the same >>>>>>>>>>>>>>> process >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Implementation approach 2 - dynamic priority programming: >>>>>>>>>>>>>>> --------------------------------------------------------- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Similar to the above, but instead of programming the >>>>>>>>>>>>>>> priorities and >>>>>>>>>>>>>>> amdgpu_init() time, the SW scheduler will reprogram the >>>>>>>>>>>>>>> queue priorities >>>>>>>>>>>>>>> dynamically when scheduling a task. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This would involve having a hardware specific callback from >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> scheduler to >>>>>>>>>>>>>>> set the appropriate queue priority: set_priority(int ring, >>>>>>>>>>>>>>> int index, >>>>>>>>>>>>>>> int priority) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> During this callback we would have to grab the SRBM mutex >>>>>>>>>>>>>>> to perform >>>>>>>>>>>>>>> the appropriate >>>>>>>>>>>>>>> HW programming, and I'm not really sure if that is >>>>>>>>>>>>>>> something we >>>>>>>>>>>>>>> should be doing from >>>>>>>>>>>>>>> the scheduler. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On the positive side, this approach would allow us to >>>>>>>>>>>>>>> program a range of >>>>>>>>>>>>>>> priorities for jobs instead of a single "high priority" >>>>>>>>>>>>>>> value", >>>>>>>>>>>>>>> achieving >>>>>>>>>>>>>>> something similar to the niceness API available for CPU >>>>>>>>>>>>>>> scheduling. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm not sure if this flexibility is something that we would >>>>>>>>>>>>>>> need for >>>>>>>>>>>>>>> our use >>>>>>>>>>>>>>> case, but it might be useful in other scenarios (multiple >>>>>>>>>>>>>>> users >>>>>>>>>>>>>>> sharing compute >>>>>>>>>>>>>>> time on a server). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This approach would require a new int field in >>>>>>>>>>>>>>> drm_amdgpu_ctx_in, or >>>>>>>>>>>>>>> repurposing >>>>>>>>>>>>>>> of the flags field. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Known current obstacles: >>>>>>>>>>>>>>> ------------------------ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The SQ is currently programmed to disregard the HQD >>>>>>>>>>>>>>> priorities, and >>>>>>>>>>>>>>> instead it picks >>>>>>>>>>>>>>> jobs at random. Settings from the shader itself are also >>>>>>>>>>>>>>> disregarded >>>>>>>>>>>>>>> as this is >>>>>>>>>>>>>>> considered a privileged field. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Effectively we can get our compute wavefront launched ASAP, >>>>>>>>>>>>>>> but we >>>>>>>>>>>>>>> might not get the >>>>>>>>>>>>>>> time we need on the SQ. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The current programming would have to be changed to allow >>>>>>>>>>>>>>> priority >>>>>>>>>>>>>>> propagation >>>>>>>>>>>>>>> from the HQD into the SQ. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Generic approach for all HW IPs: >>>>>>>>>>>>>>> -------------------------------- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> For consistency purposes, the high priority context can be >>>>>>>>>>>>>>> enabled >>>>>>>>>>>>>>> for all HW IPs >>>>>>>>>>>>>>> with support of the SW scheduler. This will function >>>>>>>>>>>>>>> similarly to the >>>>>>>>>>>>>>> current >>>>>>>>>>>>>>> AMD_SCHED_PRIORITY_KERNEL priority, where the job can jump >>>>>>>>>>>>>>> ahead of >>>>>>>>>>>>>>> anything not >>>>>>>>>>>>>>> commited to the HW queue. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The benefits of requesting a high priority context for a >>>>>>>>>>>>>>> non-compute >>>>>>>>>>>>>>> queue will >>>>>>>>>>>>>>> be lesser (e.g. up to 10s of wait time if a GFX command is >>>>>>>>>>>>>>> stuck in >>>>>>>>>>>>>>> front of >>>>>>>>>>>>>>> you), but having the API in place will allow us to easily >>>>>>>>>>>>>>> improve the >>>>>>>>>>>>>>> implementation >>>>>>>>>>>>>>> in the future as new features become available in new >>>>>>>>>>>>>>> hardware. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Future steps: >>>>>>>>>>>>>>> ------------- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Once we have an approach settled, I can take care of the >>>>>>>>>>>>>>> implementation. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Also, once the interface is mostly decided, we can start >>>>>>>>>>>>>>> thinking about >>>>>>>>>>>>>>> exposing the high priority queue through radv. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Request for feedback: >>>>>>>>>>>>>>> --------------------- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We aren't married to any of the approaches outlined above. >>>>>>>>>>>>>>> Our goal >>>>>>>>>>>>>>> is to >>>>>>>>>>>>>>> obtain a mechanism that will allow us to complete the >>>>>>>>>>>>>>> reprojection >>>>>>>>>>>>>>> job within a >>>>>>>>>>>>>>> predictable amount of time. So if anyone anyone has any >>>>>>>>>>>>>>> suggestions for >>>>>>>>>>>>>>> improvements or alternative strategies we are more than >>>>>>>>>>>>>>> happy to hear >>>>>>>>>>>>>>> them. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> If any of the technical information above is also >>>>>>>>>>>>>>> incorrect, feel >>>>>>>>>>>>>>> free to point >>>>>>>>>>>>>>> out my misunderstandings. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Looking forward to hearing from you. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> Andres >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> amd-gfx mailing list >>>>>>>>>>>>>>> amd-gfx at lists.freedesktop.org >>>>>>>>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> amd-gfx Info Page - lists.freedesktop.org >>>>>>>>>>>>>>> lists.freedesktop.org >>>>>>>>>>>>>>> To see the collection of prior postings to the list, visit >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> amd-gfx Archives. Using amd-gfx: To post a message to all >>>>>>>>>>>>>>> the list >>>>>>>>>>>>>>> members, send email ... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> amd-gfx Info Page - lists.freedesktop.org >>>>>>>>>>>>>>> lists.freedesktop.org >>>>>>>>>>>>>>> To see the collection of prior postings to the list, visit >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> amd-gfx Archives. Using amd-gfx: To post a message to all >>>>>>>>>>>>>>> the list >>>>>>>>>>>>>>> members, send email ... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> amd-gfx mailing list >>>>>>>>>>>>>>> amd-gfx at lists.freedesktop.org >>>>>>>>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> amd-gfx mailing list >>>>>>>>>>>>>> amd-gfx at lists.freedesktop.org >>>>>>>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> amd-gfx mailing list >>>>>>>>>>> amd-gfx at lists.freedesktop.org >>>>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> Sincerely yours, >>>>>>>> Serguei Sagalovitch >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> amd-gfx mailing list >>>>>>>> amd-gfx at lists.freedesktop.org >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> Sincerely yours, >>> Serguei Sagalovitch >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20161223/8b56dfb9/attachment-0001.html>