On Fri, Jun 3, 2022 at 7:42 PM Christian König <christian.koenig@xxxxxxx> wrote: > > Am 03.06.22 um 15:23 schrieb Bas Nieuwenhuizen: > > On Fri, Jun 3, 2022 at 2:49 PM Christian König <christian.koenig@xxxxxxx> wrote: > >> Am 03.06.22 um 14:39 schrieb Bas Nieuwenhuizen: > >>> On Fri, Jun 3, 2022 at 2:08 PM Christian König <christian.koenig@xxxxxxx> wrote: > >>>> Am 03.06.22 um 13:07 schrieb Bas Nieuwenhuizen: > >>>>> On Fri, Jun 3, 2022 at 12:16 PM Christian König > >>>>> <christian.koenig@xxxxxxx> wrote: > >>>>>> Am 03.06.22 um 12:08 schrieb Bas Nieuwenhuizen: > >>>>>>> [SNIP] > >>>>>>>>> I do have to fix some stuff indeed, especially for the GEM close but > >>>>>>>>> with that we should be able to keep the same basic approach? > >>>>>>>> Nope, not even remotely. > >>>>>>>> > >>>>>>>> What we need is the following: > >>>>>>>> 1. Rolling out my drm_exec patch set, so that we can lock buffers as needed. > >>>>>>>> 2. When we get a VM operation we not only lock the VM page tables, but > >>>>>>>> also all buffers we potentially need to unmap. > >>>>>>>> 3. Nuking the freed list in the amdgpu_vm structure by updating freed > >>>>>>>> areas directly when they are unmapped. > >>>>>>>> 4. Tracking those updates inside the bo_va structure for the BO+VM > >>>>>>>> combination. > >>>>>>>> 5. When the bo_va structure is destroy because of closing the handle > >>>>>>>> move the last clear operation over to the VM as implicit sync. > >>>>>>>> > >>>>>>> Hi Christian, isn't that a different problem though (that we're also > >>>>>>> trying to solve, but in your series)? > >>>>>>> > >>>>>>> What this patch tries to achieve: > >>>>>>> > >>>>>>> (t+0) CS submission setting BOOKKEEP fences (i.e. no implicit sync) > >>>>>>> (t+1) a VM operation on a BO/VM accessed by the CS. > >>>>>>> > >>>>>>> to run concurrently. What it *doesn't* try is > >>>>>>> > >>>>>>> (t+0) a VM operation on a BO/VM accessed by the CS. > >>>>>>> (t+1) CS submission setting BOOKKEEP fences (i.e. no implicit sync) > >>>>>>> > >>>>>>> to run concurrently. When you write > >>>>>>> > >>>>>>>> Only when all this is done we then can resolve the dependency that the > >>>>>>>> CS currently must wait for any clear operation on the VM. > >>>>>>> isn't that all about the second problem? > >>>>>> No, it's the same. > >>>>>> > >>>>>> See what we do in the VM code is to artificially insert a bubble so that > >>>>>> all VM clear operations wait for all CS operations and then use the > >>>>>> clear fence to indicate when the backing store of the BO can be freed. > >>>>> Isn't that remediated with something like the code below? At least the > >>>>> gem_close case should be handled with this, and the move case was > >>>>> already handled by the copy operation. > >>>> That is one necessary puzzle piece, yes. But you need more than that. > >>>> > >>>> Especially the explicit unmap operation needs to be converted into an > >>>> implicit unmap to get the TLB flush right. > >>> This doesn't change anything about the TLB flush though? Since all > >>> unmap -> later jobs dependencies are still implicit. > >>> > >>> So the worst what could happen (i.f. e.g. userspace gets the > >>> waits/dependencies wrong) is > >>> > >>> 1) non-implicit CS gets submitted that touches a BO > >>> 2) VM unmap on that BO happens > >>> 2.5) the CS from 1 is still active due to missing dependencies > >>> 2.6) but any CS submission after 2 will trigger a TLB flush > >> Yeah, but that's exactly the bubble we try to avoid. Isn't it? > > For this series, not really. To clarify there are two sides for > > getting GPU bubbles and no overlap: > > > > (1) VM operations implicitly wait for earlier CS submissions > > (2) CS submissions implicitly wait for earlier VM operations > > > > Together, these combine to ensure that you get a (potentially small) > > bubble any time VM work happens. > > > > Your series (and further ideas) tackles (2), and is a worthwhile thing > > to do. However, while writing the userspace for this I noticed this > > isn't enough to get rid of all our GPU bubbles. In particular when > > doing a non-sparse map of a new BO, that tends to need to be waited on > > for the next CS anyway for API semantics. Due to VM operations > > happening on a single timeline that means this high priority map can > > end up being blocked by earlier sparse maps and hence the bubble in > > that case still exists. > > > > So in this series I try to tackle (1) instead. Since GPU work > > typically lags behind CPU submissions and VM operations aren't that > > slow, we can typically execute VM operations early enough that any > > implicit syncs from (2) are less/no issue. > > Ok, once more since you don't seem to understand what I want to say: It > isn't possible to fix #1 before you have fixed #2. > > The VM unmap operation here is a barrier which divides the CS operations > in a before and after. This is intentional design. Why is that barrier needed? The two barriers I got and understood and I think we can deal with: 1) the VM unmap is a barrier between prior CS and later memory free. 2) The TLB flush need to happen between a VM unmap and later CS. But why do we need the VM unmap to be a strict barrier between prior CS and later CS? > > To get rid of this barrier you must first fix the part where CS > submissions wait for the VM operation to complete, e.g. the necessity of > the barrier. > > I'm working on this for a couple of years now and I'm really running out > of idea how to explain this restriction. > > Regards, > Christian. >