Re: [RFC PATCH 0/6] Common preempt fences and semantics

Matthew Brost <matthew.brost@xxxxxxxxx> · Fri, 15 Nov 2024 11:38:55 -0800

On Thu, Nov 14, 2024 at 09:38:43AM +0100, Christian König wrote:
> Am 13.11.24 um 16:34 schrieb Matthew Brost:
> > > > > > > Now the ordering works inherently in dma-resv and the scheduler. e.g. No
> > > > > > > need to track the last completion fences explictly in preempt fences.
> > > > > > I really don't think that this is a good approach. Explicitly keeping the
> > > > > > last completion fence in the pre-emption fence is basically a must have as
> > > > > > far as I can see.
> > > > > > 
> > > > > > The approach you take here looks like a really ugly hack to me.
> > > > > > 
> > > > > Well, I have to disagree; it seems like a pretty solid, common design.
> > > What you basically do is to move the responsibility to signal fences in the
> > > right order from the provider of the fences to the consumer of it.
> > > 
> > Are there not ordering rules already built into dma-resv? This is just
> > extending those preempt fences.
> 
> Well, the usage flags are to distinct which fences should be queried for
> which use case.
> 
> The order the fence are used and returned is completely undetermined. E.g.
> currently you can get READ, WRITE fences all mixed together.
> 

I was thinking there was an order here but yea looking now that does
appear to be an oversight on my end.

> > I can maybe buy some of this argument, as if a random yahoo enables
> > signaling a preempt fence without properly going through dma-resv or the
> > scheduler, then yes, that could break things. But don't do that. In Xe,
> > we have exactly two places where these can be triggered: in the TTM BO
> > move vfunc (which the scheduler handles) and in the MMU invalidation
> > path (dma-resv).
> 
> Yeah, but the purpose of all this is that the dma-resv object is shareable
> between device drivers.
> 

Agree.

> That one device driver comes up with a new requirement is certainly
> possible, but then we need to make sure that all others can live with that
> as well.
> 

Also agree.

> Just saying that we limit our scope to just the requirements of one driver
> because other are never supposed to see this fence is a recipe for problems.
> 
> > I would expect all drivers and users of dma-resv and the scheduler with
> > preempt fences to use the proper interfaces to signal preempt fences,
> > rather than bypassing this thus ensuring the common rules for preempt
> > fences are adhered to.
> 
> Waiting for fences in any order is part of the design and a requirement by
> some consumers.
> 
> See nouveau_fence_sync() for an example of what is usually done, radeon has
> similar requirements.
> 
> But those approaches are here to for optimization purposes only and not for
> correctness.
> 
> That a driver says "My fences must be waited on first A, then B" is
> something completely new.
> 

Ok, fair enough.

> > > Since we have tons of consumers of that stuff this is not even remotely a
> > > defensive design.
> > I can consider other designs, but I do think larger community input may
> > be required, as you mentioned there are several consumers of this code.
> 
> Each fence is an independent object without dependencies on anything else.
> Imposing some order on consumers of dma_fences is clearly going against
> that.
> 
> > > > > Anyway, I think I have this more or less working. I want to run this by
> > > > > the Mesa team a bit to ensure I haven't missed anything, and will likely
> > > > > post something shortly after.
> > > > > 
> > > > > We can discuss this more after I post and perhaps solicit other
> > > > > opinions, weighing the pros and cons of the approaches here. I do think
> > > > > they function roughly the same, so something commonly agreed upon would
> > > > > be good. Sharing a bit of code, if possible, is always a plus too.
> > > Well to make it clear that will never ever get a green light from my side as
> > > DMA-buf maintainer. What you suggest here is extremely fragile.
> > > 
> > It is unfortunate that this is your position, and I do feel it is a bit
> > premature to suggest as much. I didn’t know being a maintainer was akin
> > to being a dictator. As I’ve said multiple times, I feel this warrants a
> > bit more discussion with more stakeholders. If this is unacceptable,
> > sure, I can change it.
> 
> I'm sorry when you feel like that, it's really not my intention to dictate
> anything. I have probably over-reacted.
> 
> It's just to me suggesting this solution is so far of that I can't really
> understand how you came up with that.
> 
> I perfectly understand that you must have some reason for it, I just don't
> see it.
> 
> Maybe we should concentrate on understanding those reasoning first instead
> of discussing a possible solution.
> 
> > > Why not simply wait for the pending completion fences as dependency for
> > > signaling preemption fences?
> > > 
> > > That should work for all drivers and is trivial to implement as far as I can
> > > see.
> > Again, this is hard to understand without a clear picture of what AMD is
> > doing. I pointed out a dma-fencing problem in the code on the list, and
> > the response was, "Well, we have some magic ordering rules that make it
> > safe." That might be true, but if you annotated your code, lockdep would
> > complain. Anything that cannot be annotated is a non-starter for me.
> > This makes me nervous that, in fact, it is not as trivial as you
> > suggest, nor is the design as sound as you believe.
> 
> I'm pretty sure that the code is not even remotely bug free. The design and
> code has been under development for the last 16 month or so and is now
> published bit by bit.
> 
> We completely missed both during internal review as well as testing that
> lockdep should complain about it and I'm actually not sure why it doesn't.
> 
> The full code should land in amd-staging-drm-next in the next few
> days/weeks, I can give you a detailed date later today.
>

Please let me know when this happens. I will take a pass at reviewing.
We have had preemption fences in Xe for LR compute for 2+ years now and
can help spot if you missed something. It took us a while to get these
right.

> > Anyways, I'll still likely post a common solution with annotations. If
> > it is rejected, so be it, and I will rework it.
> 
> Well that sounds great. But let us discuss what this solution looks like
> instead of jumping ahead and implementing something.
> 

Sure. FWIW I changed my code to wait on last fence and it quite easy
to do so and it functioned the same. So at my reasoning that new slots
vs waiting on last was correct. While there some properties of new slots
I like, I'll just drop the new slots idea as it seems contentious.

Matt

> Regards,
> Christian.
> 
> > 
> > In spirit of open development here is my code in a public branch:
> > 
> > Kernel:https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-umd-submission/-/tree/v2-11-13-24?ref_type=heads
> > IGT:https://gitlab.freedesktop.org/mbrost/igt-gpu-tools-umd-submission/-/tree/umd_submission.v2?ref_type=heads
> > 
> > Matt
> > 
> > > Regards,
> > > Christian.
> > > 
> > > > > Matt
> > > > > 
> > > > > > Regards,
> > > > > > Christian.
> > > > > >