RE: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition

"Zeng, Oak" <oak.zeng@xxxxxxxxx> · Tue, 28 Jun 2022 13:53:07 +0000

Thanks,
Oak

> -----Original Message-----
> From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx>
> Sent: June 28, 2022 4:58 AM
> To: Zeng, Oak <oak.zeng@xxxxxxxxx>; Landwerlin, Lionel G
> <lionel.g.landwerlin@xxxxxxxxx>; Vishwanathapura, Niranjana
> <niranjana.vishwanathapura@xxxxxxxxx>
> Cc: Zanoni, Paulo R <paulo.r.zanoni@xxxxxxxxx>; intel-
> gfx@xxxxxxxxxxxxxxxxxxxxx; dri-devel@xxxxxxxxxxxxxxxxxxxxx; Hellstrom,
> Thomas <thomas.hellstrom@xxxxxxxxx>; Wilson, Chris P
> <chris.p.wilson@xxxxxxxxx>; Vetter, Daniel <daniel.vetter@xxxxxxxxx>;
> christian.koenig@xxxxxxx; Auld, Matthew <matthew.auld@xxxxxxxxx>
> Subject: Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition
> 
> 
> On 27/06/2022 19:58, Zeng, Oak wrote:
> >
> >
> > Thanks,
> > Oak
> >
> >> -----Original Message-----
> >> From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx>
> >> Sent: June 27, 2022 4:30 AM
> >> To: Zeng, Oak <oak.zeng@xxxxxxxxx>; Landwerlin, Lionel G
> >> <lionel.g.landwerlin@xxxxxxxxx>; Vishwanathapura, Niranjana
> >> <niranjana.vishwanathapura@xxxxxxxxx>
> >> Cc: Zanoni, Paulo R <paulo.r.zanoni@xxxxxxxxx>; intel-
> >> gfx@xxxxxxxxxxxxxxxxxxxxx; dri-devel@xxxxxxxxxxxxxxxxxxxxx;
> >> Hellstrom, Thomas <thomas.hellstrom@xxxxxxxxx>; Wilson, Chris P
> >> <chris.p.wilson@xxxxxxxxx>; Vetter, Daniel <daniel.vetter@xxxxxxxxx>;
> >> christian.koenig@xxxxxxx; Auld, Matthew <matthew.auld@xxxxxxxxx>
> >> Subject: Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi
> >> definition
> >>
> >>
> >> On 24/06/2022 21:23, Zeng, Oak wrote:
> >>> Let's compare "tlb invalidate at vm unbind" vs "tlb invalidate at
> >>> backing
> >> storage":
> >>>
> >>> Correctness:
> >>> consider this sequence of:
> >>> 1. unbind va1 from pa1,
> >>> 2. then bind va1 to pa2. //user space has the freedom to do this as
> >>> it manages virtual address space 3. Submit shader code using va1, 4.
> >>> Then retire pa1.
> >>>
> >>> If you don't perform tlb invalidate at step #1, in step #3, shader
> >>> will use
> >> stale entries in tlb and pa1 will be used for the shader. User want to use
> pa2.
> >> So I don't think invalidate tlb at step #4 make correctness.
> >>
> >> Define step 3. Is it a new execbuf? If so then there will be a TLB flush
> there.
> >> Unless the plan is to stop doing that with eb3 but I haven't picked
> >> up on that anywhere so far.
> >
> > In Niranjana's latest patch series, he removed the TLB flushing from
> vm_unbind. He also said explicitly TLB invalidation will be performed at job
> submission and backing storage releasing time, which is the existing behavior
> of the current i915 driver.
> >
> > I think if we invalidate TLB on each vm_unbind, then we don't need to
> invalidate at submission and backing storage releasing. It doesn't make a lot
> of sense to me to perform a tlb invalidation at execbuf time. Maybe it is a
> behavior for the old implicit binding programming model. For vm_bind and
> eb3, we separate the binding and job submission into two APIs. It is more
> natural the TLB invalidation be coupled with the vm bind/unbind, not job
> submission. So in my opinion we should remove tlb invalidation from
> submission and backing storage releasing and add it to vm unbind. This is
> method is cleaner to me.
> 
> You can propose this model (not flushing in eb3) but I have my doubts.
> Consider the pointlessness of flushing on N unbinds for 99% of clients which
> are not infinite compute batch. And consider how you make the behaviour
> consistent on all platforms (selective vs global tlb flush).

When I thought about eb3, compute workload and ulls were also in the picture. Under ulls, user mode keep submitting job without calling execbuf (it uses a semaphore to notify HW of the new batch). The execbuf + backing release flush has a correctness issue as I pointed out. Now we decided eb3 is only for mesa, not for compute, we don't have this correctness problem for now. We can close this conversation for now and revive it when we move to Xe and vm bind for compute.

Regards,
Oak

> 
> Also note that this discussion is orthogonal to unbind vs backing store release.
> 
> > Regarding performance, we don't have data. In my opinion, we should
> make things work in a most straight forward way as the first step. Then
> consider performance improvement if necessary. Consider some delayed tlb
> invalidation at submission and backing release time without performance
> data support wasn't a good decision.
> 
> It is quite straightforward though. ;) It aligns with the eb2 model and
> argument can be made backing store release is (much) less frequent than
> unbind (consider softpin where client could trigger a lot of pointless flushes).
> 
> Regards,
> 
> Tvrtko