Re: [PATCH 2/3] iris: Create a composite context for both compute and render pipelines

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Tue, 26 Mar 2019 17:15:28 +0000

Quoting Kenneth Graunke (2019-03-26 17:01:57)
> On Tuesday, March 26, 2019 12:16:20 AM PDT Chris Wilson wrote:
> > Quoting Kenneth Graunke (2019-03-26 05:52:10)
> > > On Monday, March 25, 2019 3:58:59 AM PDT Chris Wilson wrote:
> > > > iris currently uses two distinct GEM contexts to have distinct logical
> > > > HW contexts for the compute and render pipelines. However, using two
> > > > distinct GEM contexts implies that they are distinct timelines, yet as
> > > > they are a single GL context that implies they belong to a single
> > > > timeline from the client perspective. Currently, fences are occasionally
> > > > inserted to order the two timelines. Using 2 GEM contexts, also implies
> > > > that we keep 2 ppGTT for identical buffer state. If we can create a
> > > > single GEM context, with the right capabilities, we can have a single
> > > > VM, a single timeline, but 2 logical HW contexts for the 2 pipelines.
> > > > 
> > > > This is allowed through the new context interface that allows VM to be
> > > > shared, timelines to be specified, and for the logical contexts to be
> > > > constructed as the user desires.
> > > > 
> > > > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>
> > > > Cc: Kenneth Graunke <kenneth@xxxxxxxxxxxxx>
> > > > ---
> > > >  src/gallium/drivers/iris/iris_batch.c   | 16 ++-----
> > > >  src/gallium/drivers/iris/iris_batch.h   |  5 +--
> > > >  src/gallium/drivers/iris/iris_context.c | 56 ++++++++++++++++++++++++-
> > > >  3 files changed, 60 insertions(+), 17 deletions(-)
> > > 
> > > Hi Chris,
> > > 
> > > I don't think that I want the single timeline option.  It seems like
> > > we've been moving away from implicit sync for a long time, and the
> > > explicit sync code we have is pretty straightforward and seems to do
> > > the trick.  Jason and I also chatted briefly, and we don't necessarily
> > > want to a strict submission-order between render/compute.
> > 
> > I disagree if you think this means more implicit sync. It is setting up
> > the GEM context to an exact match of the GL context, by _explicit_
> > control of the timeline. Then the fences you do export from inside the
> > GL context do not need to be faked to be a composite of the pair of
> > contexts. You still have explicit fences, and you have explicit control
> > over the definition of their timeline.
> 
> With regard to multiple GL contexts, yes, everything remains explicit.
> But having 2-3 separate timelines within a GL context allows us to
> reorder work behind GL's back, which is all the rage these days for
> performance.  Tilers do it all the time.  Position-only bucketing may
> require it.  I'd really like to start treating render and compute as
> distinct asynchronous queues.  At the very least, experimenting with
> that and not tying my hands to a particular behavior.

That's a reasonable argument. If you want to try and keep the GL
semantics intact while playing with ordering underneath, have fun!

The only problem I forsee if there is any observable through which the
pipelines can determine their ordering / concurrency (sampling a common
buffer or clock) that might construe a violation.

> There may be some use for single timeline, though.  Attaching images as
> compute shader inputs may require CCS/HiZ resolves, which have to happen
> on the RCS.  Right now, I do those on IRIS_BATCH_RENDER, which mean that
> it backs up behind any queued render work.  Ideally, I'd do those on a
> third context, which could be tied to the compute timeline, so the
> resolves and the compute job can both execute ahead of queued rendering,
> but still back to back.

I have an inkling that timelines should be first class for userspace to
control exactly. But I have not seen anything close to a use case to
justify that (yet). And by the time a usecase should arise, we will
probably be onto the next shiny. That's the problem with cloudy crystal
balls.

> > > Separating the VMA from the context state image seems like absolutely
> > > the right thing to do - as you said, they're separate in hardware,
> > > and no real reason to tie it together.  I would be in favor of new
> > > uABI for that.
> > > 
> > > I don't think there will be much overhead reduction from sharing the
> > > VMA here though.  It's very plausible that the compositor might want
> > > to run between render and compute batches, at which point we end up
> > > doing page directory loads anyway.  I have also heard rumors about bit
> > > 47 becoming magical at some point which may prohibit us from sharing...
> > 
> > Yeah, but that doesn't actually affect the context setup, just how you
> > decide to use it in end. And by that point, you'll be forced into using
> > this new uABI anyway or something entirely different :-p
> 
> Looking into this a bit more, I think we're actually OK.  I thought I
> might need to have distinct addresses for render and compute - at which
> point nearly every address would differ in terms of bit 47 - but it
> looks like the correct answer is "just never use that bit".  *shrug*

Yup.

> > > Context cloning seems OK, but I'm always pretty hesitant to add new
> > > uABI unless it's strictly necessary.  In this case, we can do the same
> > > thing with a little bit of userspace code, so I'm not sure it's worth
> > > adding that...
> > 
> > Actually you cannot do the same without some of the new uABI either,
> > since previously you did not have all the parameters exposed.
> 
> What isn't exposed?  We set up everything the first time, why can't we
> do it again?

When going through the list of things I couldn't re-establish, the ppgtt
was top of that list. Hence I suddenly got motivated to provide a means
for context-recovery to be able to keep as much state as possible and
decide how much to reset.

The alternative is to be able to reset the old context. That would need
to be a synchronous operation -- I need to flush all the old requests
and cancel them before returning, in effect I might as well just create
a new context pointer, copy across all the state and insert into the idr
as the old id. Given that the answer is cloning, and cloning is useful
in general, that seems a good api to carry forward.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx