Re: [Intel-gfx] [RFC PATCH 00/20] Initial Xe driver submission

Jason Ekstrand <jason@xxxxxxxxxxxxxx> · Tue, 17 Jan 2023 10:40:55 -0600

On Thu, Jan 12, 2023 at 11:17 AM Matthew Brost <matthew.brost@xxxxxxxxx> wrote:
On Thu, Jan 12, 2023 at 10:54:25AM +0100, Lucas De Marchi wrote:

> On Thu, Jan 05, 2023 at 09:27:57PM +0000, Matthew Brost wrote:

> > On Tue, Jan 03, 2023 at 12:21:08PM +0000, Tvrtko Ursulin wrote:

> > > 

> > > On 22/12/2022 22:21, Matthew Brost wrote:

> > > > Hello,

> > > >

> > > > This is a submission for Xe, a new driver for Intel GPUs that supports both

> > > > integrated and discrete platforms starting with Tiger Lake (first platform with

> > > > Intel Xe Architecture). The intention of this new driver is to have a fresh base

> > > > to work from that is unencumbered by older platforms, whilst also taking the

> > > > opportunity to rearchitect our driver to increase sharing across the drm

> > > > subsystem, both leveraging and allowing us to contribute more towards other

> > > > shared components like TTM and drm/scheduler. The memory model is based on VM

> > > > bind which is similar to the i915 implementation. Likewise the execbuf

> > > > implementation for Xe is very similar to execbuf3 in the i915 [1].

> > > >

> > > > The code is at a stage where it is already functional and has experimental

> > > > support for multiple platforms starting from Tiger Lake, with initial support

> > > > implemented in Mesa (for Iris and Anv, our OpenGL and Vulkan drivers), as well

> > > > as in NEO (for OpenCL and Level0). A Mesa MR has been posted [2] and NEO

> > > > implementation will be released publicly early next year. We also have a suite

> > > > of IGTs for XE that will appear on the IGT list shortly.

> > > >

> > > > It has been built with the assumption of supporting multiple architectures from

> > > > the get-go, right now with tests running both on X86 and ARM hosts. And we

> > > > intend to continue working on it and improving on it as part of the kernel

> > > > community upstream.

> > > >

> > > > The new Xe driver leverages a lot from i915 and work on i915 continues as we

> > > > ready Xe for production throughout 2023.

> > > >

> > > > As for display, the intent is to share the display code with the i915 driver so

> > > > that there is maximum reuse there. Currently this is being done by compiling the

> > > > display code twice, but alternatives to that are under consideration and we want

> > > > to have more discussion on what the best final solution will look like over the

> > > > next few months. Right now, work is ongoing in refactoring the display codebase

> > > > to remove as much as possible any unnecessary dependencies on i915 specific data

> > > > structures there..

> > > >

> > > > We currently have 2 submission backends, execlists and GuC. The execlist is

> > > > meant mostly for testing and is not fully functional while GuC backend is fully

> > > > functional. As with the i915 and GuC submission, in Xe the GuC firmware is

> > > > required and should be placed in /lib/firmware/xe.

> > > 

> > > What is the plan going forward for the execlists backend? I think it would

> > > be preferable to not upstream something semi-functional and so to carry

> > > technical debt in the brand new code base, from the very start. If it is for

> > > Tigerlake, which is the starting platform for Xe, could it be made GuC only

> > > Tigerlake for instance?

> > > 

> > 

> > A little background here. In the original PoC written by Jason and Dave,

> > the execlist backend was the only one present and it was in semi-working

> > state. As soon as myself and a few others started working on Xe we went

> > full in a on the GuC backend. We left the execlist backend basically in

> > the state it was in. We left it in place for 2 reasons.

> > 

> > 1. Having 2 backends from the start ensured we layered our code

> > correctly. The layer was a complete disaster in the i915 so we really

> > wanted to avoid that.

> > 2. The thought was it might be needed for early product bring up one

> > day.

> > 

> > As I think about this a bit more, we likely just delete execlist backend

> > before merging this upstream and perhaps just carry 1 large patch

> > internally with this implementation that we can use as needed. Final

> > decession TDB though.

> 

> but that might regress after some time on "let's keep 2 backends so we

> layer the code correctly". Leaving the additional backend behind

> CONFIG_BROKEN or XE_EXPERIMENTAL, or something like that, not

> enabled by distros, but enabled in CI would be a good idea IMO.

> 

> Carrying a large patch out of tree would make things harder for new

> platforms. A perfect backend split would make it possible, but like I

> said, we are likely not to have it if we delete the second backend.

> 

Good points here Lucas. One thing that we absolutely have wrong is

falling back to execlists if GuC firmware is missing. We def should not

be doing that as it creates confusion.

Yeah, we certainly shouldn't be falling back on it silently. That's a recipe for disaster. If it stays, it should be behind a config option that's clearly labeled as broken or not intended for production use. If someone is a zero-firmware purist and wants to enable it and accept the brokenness, that's their choice.

I'm not especially attached to the execlist back-end so I'm not going to insist on anything here RE keeping it.

There is more to me starting with execlists than avoiding GuC, though. One of the reasons I did it was to prove that the same core Xe scheduling model [3] doesn't depend on firmware. As long as your hardware has some ability to juggle independent per-context rings, you can get the same separation and it makes everything cleaner. If this is the direction things are headed (and I really think it is; I need to blog about it), being able to do the Xe model on more primitive hardware which lacks competent firmware-based submission is important. I wanted to prototype that to show that it could be done.

I also kinda wanted to prove that execlists didn't have to be horrible like in i915. You know, for funzies....

--Jason

[3]: https://lists.freedesktop.org/archives/dri-devel/2023-January/386381.html

I kinda like the idea hiding it behind a config option + module

parameter to use the backend so you really, really need to try to be

able to use it + with this in the code it make us disciplined in our

layering. At some point we will likely another supported backend and at

that point we may decide to delete this backend.

Matt

> Lucas De Marchi

> 

> > 

> > Matt

> > 

> > > Regards,

> > > 

> > > Tvrtko