Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF

Oded Gabbay <oded.gabbay@xxxxxxxxx> · Wed, 23 Jun 2021 12:14:59 +0300



On Wed, Jun 23, 2021 at 11:57 AM Christian König
<ckoenig.leichtzumerken@xxxxxxxxx> wrote:
>
> Am 22.06.21 um 18:05 schrieb Jason Gunthorpe:
> > On Tue, Jun 22, 2021 at 05:48:10PM +0200, Christian König wrote:
> >> Am 22.06.21 um 17:40 schrieb Jason Gunthorpe:
> >>> On Tue, Jun 22, 2021 at 05:29:01PM +0200, Christian König wrote:
> >>>> [SNIP]
> >>>> No absolutely not. NVidia GPUs work exactly the same way.
> >>>>
> >>>> And you have tons of similar cases in embedded and SoC systems where
> >>>> intermediate memory between devices isn't directly addressable with the CPU.
> >>> None of that is PCI P2P.
> >>>
> >>> It is all some specialty direct transfer.
> >>>
> >>> You can't reasonably call dma_map_resource() on non CPU mapped memory
> >>> for instance, what address would you pass?
> >>>
> >>> Do not confuse "I am doing transfers between two HW blocks" with PCI
> >>> Peer to Peer DMA transfers - the latter is a very narrow subcase.
> >>>
> >>>> No, just using the dma_map_resource() interface.
> >>> Ik, but yes that does "work". Logan's series is better.
> >> No it isn't. It makes devices depend on allocating struct pages for their
> >> BARs which is not necessary nor desired.
> > Which dramatically reduces the cost of establishing DMA mappings, a
> > loop of dma_map_resource() is very expensive.
>
> Yeah, but that is perfectly ok. Our BAR allocations are either in chunks
> of at least 2MiB or only a single 4KiB page.
>
> Oded might run into more performance problems, but those DMA-buf
> mappings are usually set up only once.
>
> >> How do you prevent direct I/O on those pages for example?
> > GUP fails.
>
> At least that is calming.
>
> >> Allocating a struct pages has their use case, for example for exposing VRAM
> >> as memory for HMM. But that is something very specific and should not limit
> >> PCIe P2P DMA in general.
> > Sure, but that is an ideal we are far from obtaining, and nobody wants
> > to work on it prefering to do hacky hacky like this.
> >
> > If you believe in this then remove the scatter list from dmabuf, add a
> > new set of dma_map* APIs to work on physical addresses and all the
> > other stuff needed.
>
> Yeah, that's what I totally agree on. And I actually hoped that the new
> P2P work for PCIe would go into that direction, but that didn't
> materialized.
>
> But allocating struct pages for PCIe BARs which are essentially
> registers and not memory is much more hacky than the dma_resource_map()
> approach.
>
> To re-iterate why I think that having struct pages for those BARs is a
> bad idea: Our doorbells on AMD GPUs are write and read pointers for ring
> buffers.
>
> When you write to the BAR you essentially tell the firmware that you
> have either filled the ring buffer or read a bunch of it. This in turn
> then triggers an interrupt in the hardware/firmware which was eventually
> asleep.
>
> By using PCIe P2P we want to avoid the round trip to the CPU when one
> device has filled the ring buffer and another device must be woken up to
> process it.
>
> Think of it as MSI-X in reverse and allocating struct pages for those
> BARs just to work around the shortcomings of the DMA API makes no sense
> at all to me.
We would also like to do that *in the future*.
In Gaudi it will never be supported (due to security limitations) but
I definitely see it happening in future ASICs.

Oded

>
>
> We also do have the VRAM BAR, and for HMM we do allocate struct pages
> for the address range exposed there. But this is a different use case.
>
> Regards,
> Christian.
>
> >
> > Otherwise, we have what we have and drivers don't get to opt out. This
> > is why the stuff in AMDGPU was NAK'd.
> >
> > Jason
>