On May 10, 2021 08:55:55 Martin Peres <martin.peres@xxxxxxx> wrote:
On 10/05/2021 02:11, Jason Ekstrand wrote:On May 9, 2021 12:12:36 Martin Peres <martin.peres@xxxxxxx> wrote:Hi,On 06/05/2021 22:13, Matthew Brost wrote:Basic GuC submission support. This is the first bullet point in theupstreaming plan covered in the following RFC [1].At a very high level the GuC is a piece of firmware which sits betweenthe i915 and the GPU. It offloads some of the scheduling of contextsfrom the i915 and programs the GPU to submit contexts. The i915communicates with the GuC and the GuC communicates with the GPU.May I ask what will GuC command submission do that execlist won't/can'tdo? And what would be the impact on users? Even forgetting the troubledhistory of GuC (instability, performance regression, poor level of usersupport, 6+ years of trying to upstream it...), adding this much codeand doubling the amount of validation needed should come with arationale making it feel worth it... and I am not seeing here. Would youmind providing the rationale behind this work?GuC submission will be disabled by default on all current upstreamplatforms behind a module parameter - enable_guc. A value of 3 willenable submission and HuC loading via the GuC. GuC submission shouldwork on all gen11+ platforms assuming the GuC firmware is present.What is the plan here when it comes to keeping support for execlist? Iam afraid that landing GuC support in Linux is the first step towardskilling the execlist, which would force users to use proprietaryfirmwares that even most Intel engineers have little influence over.Indeed, if "drm/i915/guc: Disable semaphores when using GuC scheduling"which states "Disable semaphores when using GuC scheduling as semaphoresare broken in the current GuC firmware." is anything to go by, it meansthat even Intel developers seem to prefer working around the GuCfirmware, rather than fixing it.Yes, landing GuC support may be the first step in removing execlistsupport. The inevitable reality is that GPU scheduling is coming andlikely to be there only path in the not-too-distant future. (See alsothe ongoing thread with AMD about fences.) I'm not going to passjudgement on whether or not this is a good thing. I'm just reading thewinds and, in my view, this is where things are headed for good or ill.In answer to the question above, the answer to "what do we gain fromGuC?" may soon be, "you get to use your GPU." We're not there yet and,again, I'm not necessarily advocating for it, but that is likely wherethings are headed.This will be a sad day, especially since it seems fundamentally opposedwith any long-term support, on top of taking away user freedom tofix/tweak their system when Intel won't.A firmware-based submission model isn't a bad design IMO and, aside fromthe firmware freedom issues, I think there are actual advantages to themodel. Immediately, it'll unlock a few features like parallel submission(more on that in a bit) and long-running compute because they'reimplemented in GuC and the work to implement them properly in theexeclist scheduler is highly non-trivial. Longer term, it may (noguarantees) unlock some performance by getting the kernel out of the way.Oh, I definitely agree with firmware-based submission model not being abad design. I was even cheering for it in 2015. Experience with it mademe regret that deeply since :sBut with the DRM scheduler being responsible for most things, I fail tosee what we could offload in the GuC except context switching (likeevery other manufacturer). The problem is, the GuC does way more thanjust switching registers in bulk, and if the number of revisions of theGuC is anything to go by, it is way too complex for me to feelcomfortable with it.
It's more than just bulk register writes. When it comes to load-balancing multiple GPU users, firmware can theoretically preempt and switch faster leading to more efficient time-slicing. All we really need the DRM scheduler for is handling implicit dma_fence dependencies between different applications.
In the same vein, I have another concern related to the impact of GuC onLinux's stable releases. Let's say that in 3 years, a new applicationtriggers a bug in command submission inside the firmware. Given that theLinux community cannot patch the GuC firmware, how likely is it thatIntel would release a new GuC version? That would not be necessarilysuch a big problem if newer versions of the GuC could easily bebackported to this potentially-decade-old Linux version, but given thatthe GuC seems to have ABI-breaking changes on a monthly cadence (we areat major version 60 *already*? :o), I would say that it ishighly-unlikely that it would not require potentially-extensive changesto i915 to make it work, making the fix almost impossible to land in thestable tree... Do you have a plan to mitigate this problem?Patches like "drm/i915/guc: Disable bonding extension with GuCsubmission" also make me twitch, as this means the two commandsubmission paths will not be functionally equivalent, and enabling GuCcould thus introduce a user-visible regression (one app used to work,then stopped working). Could you add in the commit's message a proofthat this would not end up being a user regression (in which case, whyhave this codepath to begin with?).I'd like to address this one specifically as it's become something of aspeciality of mine the past few weeks. The current bonded submissionmodel is bad. It provides a plethora of ways for a client to back itselfinto a corner and doesn't actually provide the guarantees the mediadriver needs for its real-time high-resolution decode. It's bad enoughwe're seriously considering ripping it out, backwards compatibility ornot. The good news is that very little that your average desktop userdoes depends on it: basically just real-time >4K video decode.The new parallel submit API is much better and should be the pathforward. (We should have landed parallel submit the first time around.)It isn't full of corners and does let us provides actual parallelexecution guarantees. It also gives the scheduler the information itneeds to reliably provide those guarantees. >If we need to support the parallel submit API with the execlistback-end, that's totally possible. The choice to only implement theparallel submit API with GuC is a pragmatic one. We're trying to getupstream back on it's feet and get all the various up-and-coming bits ofhardware enabled. Enabling the new API in the execlist back-end makesthat pipeline longer.I feel your pain, and wish you all the best to get GEM less complexand more manageable.So, if I understood correctly, the plan is just to regress 4K+ videodecoding for people who do not enable GuC scheduling, or did not alsoupdate to a recent-enough media driver that would support this newinterface? If it is indeed only for over 4K videos, then whatever. If itis 4K, it starts being a little bad, assuming graceful fallback toCPU-based decoding. What's the test plan for this patch then? The patchin its current form is definitely not making me confident.
My understanding is that it's only >4k that's affected; we've got enough bandwidth on a single VCS for 4K. I'm not sure where the exact cut-off is (it may be a little higher than 4k) but real-time 4k should be fine and real-time 8k requires parallel submit. So we're really not cutting off many use-cases. Also, as I said above, the new API can be implemented with the execlist scheduler if needed. We've just pragmatically deprioritized it.
--Jason
Finally, could you explain why IGT tests need to be modified to work theGuC [1], and how much of the code in this series is covered byexisting/upcoming tests? I would expect a very solid set of tests tominimize the maintenance burden, and enable users to reproduce potentialissues found in this new codepath (too many users run with enable_guc=3,as can be seen on Google[2]).The IGT changes, as I understand them, are entirely around switching tothe new parallel submit API. There shouldn't be a major effect to mostusers.Right, this part I followed, but failed to connect it to the GuC...because I couldn't see why it would be needed (execlist requiring a lotmore work).I sincerely wish for the GuC to stay away from upstream because of theabove concerns (which are yet to be addressed), but if Intel were topush forward with the plan to drop execlist, I can foresee a world oftrouble for users... That is of course unless the GuC were to be opensourced, with people outside of Intel able to sign their own builds orrun unsigned. Failing that, let's hope the last 6 years were just a badstart, and the rapid climb in major version of the GuC will magicallystop! I hope execlists will remain at feature parity with the GuC whenpossible... but deplore the increase in validation needs which will onlyhurt users in the end.Thanks for your honest answer,Martin--Jason
_______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx