On 27/06/2022 19:58, Zeng, Oak wrote:
Thanks,
Oak
-----Original Message-----
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx>
Sent: June 27, 2022 4:30 AM
To: Zeng, Oak <oak.zeng@xxxxxxxxx>; Landwerlin, Lionel G
<lionel.g.landwerlin@xxxxxxxxx>; Vishwanathapura, Niranjana
<niranjana.vishwanathapura@xxxxxxxxx>
Cc: Zanoni, Paulo R <paulo.r.zanoni@xxxxxxxxx>; intel-
gfx@xxxxxxxxxxxxxxxxxxxxx; dri-devel@xxxxxxxxxxxxxxxxxxxxx; Hellstrom,
Thomas <thomas.hellstrom@xxxxxxxxx>; Wilson, Chris P
<chris.p.wilson@xxxxxxxxx>; Vetter, Daniel <daniel.vetter@xxxxxxxxx>;
christian.koenig@xxxxxxx; Auld, Matthew <matthew.auld@xxxxxxxxx>
Subject: Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition
On 24/06/2022 21:23, Zeng, Oak wrote:
Let's compare "tlb invalidate at vm unbind" vs "tlb invalidate at backing
storage":
Correctness:
consider this sequence of:
1. unbind va1 from pa1,
2. then bind va1 to pa2. //user space has the freedom to do this as it
manages virtual address space 3. Submit shader code using va1, 4. Then
retire pa1.
If you don't perform tlb invalidate at step #1, in step #3, shader will use
stale entries in tlb and pa1 will be used for the shader. User want to use pa2.
So I don't think invalidate tlb at step #4 make correctness.
Define step 3. Is it a new execbuf? If so then there will be a TLB flush there.
Unless the plan is to stop doing that with eb3 but I haven't picked up on that
anywhere so far.
In Niranjana's latest patch series, he removed the TLB flushing from vm_unbind. He also said explicitly TLB invalidation will be performed at job submission and backing storage releasing time, which is the existing behavior of the current i915 driver.
I think if we invalidate TLB on each vm_unbind, then we don't need to invalidate at submission and backing storage releasing. It doesn't make a lot of sense to me to perform a tlb invalidation at execbuf time. Maybe it is a behavior for the old implicit binding programming model. For vm_bind and eb3, we separate the binding and job submission into two APIs. It is more natural the TLB invalidation be coupled with the vm bind/unbind, not job submission. So in my opinion we should remove tlb invalidation from submission and backing storage releasing and add it to vm unbind. This is method is cleaner to me.
You can propose this model (not flushing in eb3) but I have my doubts.
Consider the pointlessness of flushing on N unbinds for 99% of clients
which are not infinite compute batch. And consider how you make the
behaviour consistent on all platforms (selective vs global tlb flush).
Also note that this discussion is orthogonal to unbind vs backing store
release.
Regarding performance, we don't have data. In my opinion, we should make things work in a most straight forward way as the first step. Then consider performance improvement if necessary. Consider some delayed tlb invalidation at submission and backing release time without performance data support wasn't a good decision.
It is quite straightforward though. ;) It aligns with the eb2 model and
argument can be made backing store release is (much) less frequent than
unbind (consider softpin where client could trigger a lot of pointless
flushes).
Regards,
Tvrtko