On Fri, May 14, 2021 at 11:36:37AM -0500, Jason Ekstrand wrote: > On Fri, May 14, 2021 at 6:12 AM Tvrtko Ursulin > <tvrtko.ursulin@xxxxxxxxxxxxxxx> wrote: > > > > On 06/05/2021 20:13, Matthew Brost wrote: > > > Basic GuC submission support. This is the first bullet point in the > > > upstreaming plan covered in the following RFC [1]. > > > > > > At a very high level the GuC is a piece of firmware which sits between > > > the i915 and the GPU. It offloads some of the scheduling of contexts > > > from the i915 and programs the GPU to submit contexts. The i915 > > > communicates with the GuC and the GuC communicates with the GPU. > > > > > > GuC submission will be disabled by default on all current upstream > > > platforms behind a module parameter - enable_guc. A value of 3 will > > > enable submission and HuC loading via the GuC. GuC submission should > > > work on all gen11+ platforms assuming the GuC firmware is present. > > > > Some thoughts mostly relating to future platforms where GuC will be the > > only option, and to some extent platforms where it will be possible to > > turn it on for one reason or another. > > > > Debuggability - in the context of having an upstream way/tool for > > capturing and viewing GuC logs usable for attaching to bug reports. > > > > Currently i915 logs, can provide traces via tracepoints and trace > > printk, and GPU error capture state, which provides often sufficient > > trail of evidence to debug issues. > > > > We need to make sure GuC does is not a black box in this respect. By > > this I mean it does not hide a large portion of the execution flows from > > upstream observability. > > I agree here. If GuC suddenly makes submission issues massively > harder to debug then that's a regression vs. execlists. I don't know > what the solution there is but I think the concern is valid. > Replied to Tvrtko with detailed answers. The TL;DR is agree with basically everything he said and we have plans address it all and everything must be addressed before the GuC can be turned on by default. Matt > > This could mean a tool in IGT to access/capture GuC logs and update bug > > filing instructions. > > > > Leading from here is probably the need for the GuC firmware team to > > cross the internal-upstream boundary and deal with such bug reports on > > upstream trackers. Upstream GuC is unlikely to work if we don't have > > such plan and commitment. > > I mostly agree here as well. I'm not sure it'll actually happen but > I'd like anyone who writes code which impacts Linux to be active in > upstream bug trackers. > > > Also leading from here is the need for GPU error capture to be on par > > from day one which is I believe still not there in the firmware. > > This one has me genuinely concerned. I've heard rumors that we don't > have competent error captures with GuC yet. From the Mesa PoV, this > is a non-starter. We can't be asked to develop graphics drivers with > no error capture. > > The good news is that, based on my understanding, it shouldn't be > terrible to support. We just need the GuC to grab all the registers > for us and shove them in a buffer somewhere before it resets the GPU > and all that data is lost. I would hope the Windows people have > already done that and we just need to hook it up. If not, there may > be some GuC engineering required here. > > > Another, although unrelated, missing feature on my wish list is firmware > > support for wiring up accurate engine busyness stats to i915 PMU. I > > believe this is also being worked on but I don't know when is the > > expected delivery. > > > > If we are tracking a TODO list of items somewhere I think these ones > > should be definitely considered. > > Yup, let's get it all in the ToDo and not flip GuC on by default in > the wild until it's all checked off. > > --Jason