RE: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition

"Zeng, Oak" <oak.zeng@xxxxxxxxx> · Thu, 23 Jun 2022 21:05:47 +0000

Regards,
Oak

> -----Original Message-----
> From: Intel-gfx <intel-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Tvrtko
> Ursulin
> Sent: June 23, 2022 7:06 AM
> To: Landwerlin, Lionel G <lionel.g.landwerlin@xxxxxxxxx>; Vishwanathapura,
> Niranjana <niranjana.vishwanathapura@xxxxxxxxx>
> Cc: Zanoni, Paulo R <paulo.r.zanoni@xxxxxxxxx>; intel-gfx@xxxxxxxxxxxxxxxxxxxxx;
> dri-devel@xxxxxxxxxxxxxxxxxxxxx; Hellstrom, Thomas <thomas.hellstrom@xxxxxxxxx>;
> Wilson, Chris P <chris.p.wilson@xxxxxxxxx>; Vetter, Daniel
> <daniel.vetter@xxxxxxxxx>; christian.koenig@xxxxxxx; Auld, Matthew
> <matthew.auld@xxxxxxxxx>
> Subject: Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition
> 
> 
> On 23/06/2022 09:57, Lionel Landwerlin wrote:
> > On 23/06/2022 11:27, Tvrtko Ursulin wrote:
> >>>
> >>> After a vm_unbind, UMD can re-bind to same VA range against an active
> >>> VM.
> >>> Though I am not sue with Mesa usecase if that new mapping is required
> >>> for
> >>> running GPU job or it will be for the next submission. But ensuring the
> >>> tlb flush upon unbind, KMD can ensure correctness.
> >>
> >> Isn't that their problem? If they re-bind for submitting _new_ work
> >> then they get the flush as part of batch buffer pre-amble.
> >
> > In the non sparse case, if a VA range is unbound, it is invalid to use
> > that range for anything until it has been rebound by something else.
> >
> > We'll take the fence provided by vm_bind and put it as a wait fence on
> > the next execbuffer.
> >
> > It might be safer in case of memory over fetching?
> >
> >
> > TLB flush will have to happen at some point right?
> >
> > What's the alternative to do it in unbind?
> 
> Currently TLB flush happens from the ring before every BB_START and also
> when i915 returns the backing store pages to the system.

Can you explain more why tlb flush when i915 retire the backing storage? I never figured that out when I looked at the codes. As I understand it, tlb caches the gpu page tables which map a va to a pa. So it is straight forward to me that we perform a tlb flush when we change the page table (either at vm bind time or unbind time. Better at unbind time for performance reason).

But it is rather tricky to me to flush tlb when we retire a backing storage. I don't see how backing storage can be connected to page table. Let's say user unbind va1 from pa1, then bind va1 to pa2. Then retire pa1. Submit shader code using va1. If we don't tlb flush after unbind va1, the new shader code which is supposed to use pa2 will still use pa1 due to the stale entries in tlb, right? The point is, tlb cached is tagged with virtual address, not physical address. so after we unbind va1 from pa1, regardless we retire pa1 or not, va1 can be bound to another pa2.

Thanks,
Oak 

> 
> For the former, I haven't seen any mention that for execbuf3 there are
> plans to stop doing it? Anyway, as long as this is kept and sequence of
> bind[1..N]+execbuf is safe and correctly sees all the preceding binds.
> Hence about the alternative to doing it in unbind - first I think lets
> state the problem that is trying to solve.
> 
> For instance is it just for the compute "append work to the running
> batch" use case? I honestly don't remember how was that supposed to work
> so maybe the tlb flush on bind was supposed to deal with that scenario?
> 
> Or you see a problem even for Mesa with the current model?
> 
> Regards,
> 
> Tvrtko