Re: [RFC PATCH 3/5] drm/amdgpu: Allow explicit sync for VM ops.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Christian,

Friendly ping on the comments here. Are you okay with me re-spinning
the series with a fixed patch 1 or do we need further discussion on
the approach here?

Thanks,
Bas

On Mon, Jun 6, 2022 at 1:00 PM Bas Nieuwenhuizen
<bas@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Mon, Jun 6, 2022 at 12:35 PM Christian König
> <christian.koenig@xxxxxxx> wrote:
> >
> > Am 06.06.22 um 12:30 schrieb Bas Nieuwenhuizen:
> > > On Mon, Jun 6, 2022 at 12:15 PM Christian König
> > > <christian.koenig@xxxxxxx> wrote:
> > >>
> > >>
> > >> Am 03.06.22 um 21:11 schrieb Bas Nieuwenhuizen:
> > >>> On Fri, Jun 3, 2022 at 8:41 PM Christian König <christian.koenig@xxxxxxx> wrote:
> > >>>> Am 03.06.22 um 19:50 schrieb Bas Nieuwenhuizen:
> > >>>>> [SNIP]
> > >>>>>>>> Yeah, but that's exactly the bubble we try to avoid. Isn't it?
> > >>>>>>> For this series, not really.  To clarify there are two sides for
> > >>>>>>> getting GPU bubbles and no overlap:
> > >>>>>>>
> > >>>>>>> (1) VM operations implicitly wait for earlier CS submissions
> > >>>>>>> (2) CS submissions implicitly wait for earlier VM operations
> > >>>>>>>
> > >>>>>>> Together, these combine to ensure that you get a (potentially small)
> > >>>>>>> bubble any time VM work happens.
> > >>>>>>>
> > >>>>>>> Your series (and further ideas) tackles (2), and is a worthwhile thing
> > >>>>>>> to do. However, while writing the userspace for this I noticed this
> > >>>>>>> isn't enough to get rid of all our GPU bubbles. In particular when
> > >>>>>>> doing a non-sparse map of a new BO, that tends to need to be waited on
> > >>>>>>> for the next CS anyway for API semantics. Due to VM operations
> > >>>>>>> happening on a single timeline that means this high priority map can
> > >>>>>>> end up being blocked by earlier sparse maps and hence the bubble in
> > >>>>>>> that case still exists.
> > >>>>>>>
> > >>>>>>> So in this series I try to tackle (1) instead. Since GPU work
> > >>>>>>> typically lags behind CPU submissions and VM operations aren't that
> > >>>>>>> slow, we can typically execute VM operations early enough that any
> > >>>>>>> implicit syncs from (2) are less/no issue.
> > >>>>>> Ok, once more since you don't seem to understand what I want to say: It
> > >>>>>> isn't possible to fix #1 before you have fixed #2.
> > >>>>>>
> > >>>>>> The VM unmap operation here is a barrier which divides the CS operations
> > >>>>>> in a before and after. This is intentional design.
> > >>>>> Why is that barrier needed? The two barriers I got and understood and
> > >>>>> I think we can deal with:
> > >>>>>
> > >>>>> 1) the VM unmap is a barrier between prior CS and later memory free.
> > >>>>> 2) The TLB flush need to happen between a VM unmap and later CS.
> > >>>>>
> > >>>>> But why do we need the VM unmap to be a strict barrier between prior
> > >>>>> CS and later CS?
> > >>>> Exactly because of the two reasons you mentioned.
> > >>> This is the part I'm not seeing. I get that removing #2 is a
> > >>> nightmare, which is why I did something that doesn't violate that
> > >>> constraint.
> > >>>
> > >>> Like if an explicit CS that was running before the VM operation  runs
> > >>> till after the VM operation (and hence possibly till after the TLB
> > >>> flush, or otherwise have the TLB flush not apply due to lack of async
> > >>> TLB flush support), that is not an issue. It might see the state from
> > >>> before the unmap, or after the unmap, or some intermediate state and
> > >>> all of those would be okay.
> > >>>
> > >>> We still get the constraint that the TLB flush happens between the VM
> > >>> unmap and later CS and hence the unmap is certainly visible to them.
> > >> So you basically just want to set the sync mode in
> > >> amdgpu_vm_update_range() to AMDGPU_SYNC_EXPLICIT even when it is an unmap?
> > > Yes, with the caveat that I want to do that only for
> > > DMA_RESV_USAGE_BOOKKEEP or higher (i.e. if we submit a CS with
> > > implicit sync we get the old implicit behavior, if we submit a CS with
> > > explicit sync we get the new explicit behavior). The rest of the
> > > series is basically just for enabling explicit sync submissions.
> >
> > That part won't work at all and would cause additional synchronization
> > problems.
> >
> > First of all for implicit synced CS we should use READ, not BOOKKEEP.
> > Because BOOKKEEP would incorrectly be ignored by OpenGL importers. I've
> > fixed that this causes memory corruption, but it is still nice to avoid.
>
> Yes, what I'm saying is that on implicit sync CS submission should add
> READ fences to the dma resv and on explicit sync CS submission should
> add BOOKKEEP fences.
>
> >
> > BOOKKEEP can only be used by VM updates themselves. So that they don't
> > interfere with CS.
>
> That is the point why we would go BOOKKEEP for explicit sync CS
> submissions, no? Explicit submission shouldn't interfere with any
> other CS submissions. That includes being totally ignored by GL
> importers (if we want to have synchronization there between an
> explicit submission and GL, userspace is expected to use Jason's
> dmabuf fence import/export IOCTLs)
>
> >
> > Then the second problem is that the VM IOCTL has absolutely no idea what
> > the CS IOCTL would be doing. That's why we have added the EXPLICIT sync
> > flag on the BO.
>
> It doesn't need to? We just use a different sync_mode for BOOKKEEP
> fences vs others:
> https://patchwork.freedesktop.org/patch/487887/?series=104578&rev=2
>
> (the nice thing about doing it this way is that it is independent of
> the IOCTL, i.e. also works for the delayed mapping changes we trigger
> on CS submit)
>
> >
> > Regards,
> > Christian.
> >
> > >
> > >> That should be doable, but then you don't need any of the other changes.
> > >>
> > >> Regards,
> > >> Christian.
> > >>
> > >>>> #1 Is rather easy to fix, you just need to copy all dma_fences from the
> > >>>> page table dma_resv object over to the BOs dma_resv object in the gem
> > >>>> close handler. E.g. exactly what you suggested with the dma_resv_copy
> > >>>> function.
> > >>>>
> > >>>> #2 is a nightmare.
> > >>>>
> > >>>> We can't move the TLB flush at the end of the unmap operation because on
> > >>>> async TLB flushes are either a bit complicated (double flushes etc..) or
> > >>>> don't even work at all because of hw bugs. So to have a reliable TLB
> > >>>> flush we must make sure that nothing else is ongoing and that means
> > >>>> CS->VM->CS barrier.
> > >>>>
> > >>>> We try very hard to circumvent that already on maps by (for example)
> > >>>> using a completely new VMID for CS after the VM map operation.
> > >>>>
> > >>>> But for the unmap operation we would need some kind special dma_fence
> > >>>> implementation which would not only wait for all existing dma_fence but
> > >>>> also for the one added until the unmap operation is completed. Cause
> > >>>> otherwise our operation we do at #1 would simply not catch all
> > >>>> dma_fences which have access to the memory.
> > >>>>
> > >>>> That's certainly doable, but I think just using the drm_exec stuff I
> > >>>> already came up with is easier.
> > >>>>
> > >>>> When we can grab locks for all the BOs involved amdgpu_vm_clear_freed()
> > >>>> goes away and we can keep track of the unmap operations in the bo_va
> > >>>> structure.
> > >>>>
> > >>>> With that done you can make the explicit sync you noted in the bo_va
> > >>>> structure and implicit sync when the bo_va structure goes away.
> > >>>>
> > >>>> Then the only reason I can see why we would need a CS->VM dependency is
> > >>>> implicit synchronization, and that's what we are trying to avoid here in
> > >>>> the first place.
> > >>>>
> > >>>> Regards,
> > >>>> Christian.
> > >>>>
> > >>>>>> To get rid of this barrier you must first fix the part where CS
> > >>>>>> submissions wait for the VM operation to complete, e.g. the necessity of
> > >>>>>> the barrier.
> > >>>>>>
> > >>>>>> I'm working on this for a couple of years now and I'm really running out
> > >>>>>> of idea how to explain this restriction.
> > >>>>>>
> > >>>>>> Regards,
> > >>>>>> Christian.
> > >>>>>>
> >




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux