On Tue, Sep 08, 2020 at 08:43:59AM +0300, Laurent Pinchart wrote: > Hi Daniel, > > On Tue, Sep 01, 2020 at 09:32:22AM +0200, Daniel Vetter wrote: > > On Sun, Aug 16, 2020 at 02:22:46PM -0300, Ezequiel Garcia wrote: > > > This heap is basically a wrapper around DMA-API dma_alloc_attrs, > > > which will allocate memory suitable for the given device. > > > > > > The implementation is mostly a port of the Contiguous Videobuf2 > > > memory allocator (see videobuf2/videobuf2-dma-contig.c) > > > over to the DMA-BUF Heap interface. > > > > > > The intention of this allocator is to provide applications > > > with a more system-agnostic API: the only thing the application > > > needs to know is which device to get the buffer for. > > > > > > Whether the buffer is backed by CMA, IOMMU or a DMA Pool > > > is unknown to the application. > > > > > > I'm not really expecting this patch to be correct or even > > > a good idea, but just submitting it to start a discussion on DMA-BUF > > > heap discovery and negotiation. > > > > > > Given Plumbers is just a couple weeks from now, I've submitted > > > a BoF proposal to discuss this, as perhaps it would make > > > sense to discuss this live? > > > > > > Not-signed-off-by: Ezequiel Garcia <ezequiel@xxxxxxxxxxxxx> > > > > I think on the uapi/constraint solving side there's been already tons of > > discussions while I enjoyed vacations, so different concern from me > > entirely on the implementation side: > > > > In the past the only thing we had in upstream was subsystem/driver > > specific allocators, and dma-buf for sharing. With dma-buf heaps we kinda > > get a central allocator, which duplicates large chunks of of all these > > allocators. And since it's a central allocator by design, the reason for > > having per-subsystem allocators is kinda gone. > > > > I think there's two approaches here: > > - we convert e.g. drm allocator helpers to internally use the right heap > > implementation. > > We however don't always have a need for a dmabuf, as not all buffers are > meant to be shared (and we often can't tell beforehand, at allocation > time, if a given buffer will be shared or not). While the overhead of > allocating a dmabuf should be file, assigning a file descriptor to each > buffer would be prohibitely expensive. > > We would need to decouple the dma heaps from file descriptors. I think > that's doable, but it changes the philosophy of dma heaps and needs > careful consideration. In particular, one may wonder what that would > bring us, compared to the DMA mapping API for instance. Oh I don't mean that we use the dma-buf heaps interfaces, but the backend. Underneath it's all a struct dma_buf, that's where I figured we should intercept the code sharing. Definitely not file descriptor handles. > > That would also give us some fairly direct way to expose > > these constraints in sysfs so a userspace allocator knows which dma-buf > > heap to pick for shared stuff. > > sysfs won't work I'm afraid, as constraints may depend on the format for > instace. We need subsystem-specific APIs to expose constraints. Yeah sysfs would be the default, and maybe a list of useable heaps that the subsystem api then tells you which to use when. -Daniel > > > - we require that any heap is just a different uapi for an existing driver > > allocator, e.g. by having a dma-buf wrapper for all gem drivers. > > > > Otherwise I think what we end up with is a pile of dma-buf heaps for > > android's blob gpu driver world, and not used anywhere else. Not something > > even remotely interesting for upstream :-) > > > > tldr; I'd like to see how dma-buf heaps closely integrate with all the > > existing buffer management code we have. Both kernel (and throuhg some > > allocator library effort) in userspace. > > > > > --- > > > drivers/dma-buf/heaps/Kconfig | 9 + > > > drivers/dma-buf/heaps/Makefile | 1 + > > > drivers/dma-buf/heaps/device_heap.c | 268 ++++++++++++++++++++++++++++ > > > include/linux/device.h | 5 + > > > include/linux/dma-heap.h | 6 + > > > 5 files changed, 289 insertions(+) > > > create mode 100644 drivers/dma-buf/heaps/device_heap.c > > > > > > diff --git a/drivers/dma-buf/heaps/Kconfig b/drivers/dma-buf/heaps/Kconfig > > > index a5eef06c4226..2bb3604184bd 100644 > > > --- a/drivers/dma-buf/heaps/Kconfig > > > +++ b/drivers/dma-buf/heaps/Kconfig > > > @@ -12,3 +12,12 @@ config DMABUF_HEAPS_CMA > > > Choose this option to enable dma-buf CMA heap. This heap is backed > > > by the Contiguous Memory Allocator (CMA). If your system has these > > > regions, you should say Y here. > > > + > > > +config DMABUF_HEAPS_DEVICES > > > + bool "DMA-BUF Device DMA Heap (Experimental)" > > > + depends on DMABUF_HEAPS > > > + help > > > + Choose this option to enable dma-buf per-device heap. This heap is backed > > > + by the DMA-API and it's an Experimental feature, meant mostly for testing > > > + and experimentation. > > > + Just say N here. > > > diff --git a/drivers/dma-buf/heaps/Makefile b/drivers/dma-buf/heaps/Makefile > > > index 6e54cdec3da0..c691d85b3044 100644 > > > --- a/drivers/dma-buf/heaps/Makefile > > > +++ b/drivers/dma-buf/heaps/Makefile > > > @@ -2,3 +2,4 @@ > > > obj-y += heap-helpers.o > > > obj-$(CONFIG_DMABUF_HEAPS_SYSTEM) += system_heap.o > > > obj-$(CONFIG_DMABUF_HEAPS_CMA) += cma_heap.o > > > +obj-$(CONFIG_DMABUF_HEAPS_DEVICES) += device_heap.o > > > diff --git a/drivers/dma-buf/heaps/device_heap.c b/drivers/dma-buf/heaps/device_heap.c > > > new file mode 100644 > > > index 000000000000..1803dc622dd8 > > > --- /dev/null > > > +++ b/drivers/dma-buf/heaps/device_heap.c > > > @@ -0,0 +1,268 @@ > > > +// SPDX-License-Identifier: GPL-2.0 > > > +/* > > > + * DMABUF Device DMA heap exporter > > > + * > > > + * Copyright (C) 2020, Collabora Ltd. > > > + * > > > + * Based on: > > > + * videobuf2-dma-contig.c - DMA contig memory allocator for videobuf2 > > > + * Copyright (C) 2010 Samsung Electronics > > > + */ > > > + > > > +#include <linux/device.h> > > > +#include <linux/dma-buf.h> > > > +#include <linux/dma-heap.h> > > > +#include <linux/dma-mapping.h> > > > +#include <linux/scatterlist.h> > > > +#include <linux/slab.h> > > > +#include <linux/module.h> > > > + > > > +struct dev_dmabuf_attachment { > > > + struct sg_table sgt; > > > + enum dma_data_direction dma_dir; > > > +}; > > > + > > > +struct dev_dmabuf { > > > + struct dma_heap *heap; > > > + struct dma_buf *dmabuf; > > > + struct device *dev; > > > + size_t size; > > > + void *vaddr; > > > + dma_addr_t dma_addr; > > > + unsigned long attrs; > > > + > > > + struct sg_table sgt; > > > +}; > > > + > > > +static struct sg_table *dev_dmabuf_ops_map(struct dma_buf_attachment *db_attach, > > > + enum dma_data_direction dma_dir) > > > +{ > > > + struct dev_dmabuf_attachment *attach = db_attach->priv; > > > + /* stealing dmabuf mutex to serialize map/unmap operations */ > > > + struct mutex *lock = &db_attach->dmabuf->lock; > > > + struct sg_table *sgt; > > > + > > > + mutex_lock(lock); > > > + > > > + sgt = &attach->sgt; > > > + /* return previously mapped sg table */ > > > + if (attach->dma_dir == dma_dir) { > > > + mutex_unlock(lock); > > > + return sgt; > > > + } > > > + > > > + /* release any previous cache */ > > > + if (attach->dma_dir != DMA_NONE) { > > > + dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents, > > > + attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC); > > > + attach->dma_dir = DMA_NONE; > > > + } > > > + > > > + /* > > > + * mapping to the client with new direction, no cache sync > > > + * required see comment in .dmabuf_ops_detach() > > > + */ > > > + sgt->nents = dma_map_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents, > > > + dma_dir, DMA_ATTR_SKIP_CPU_SYNC); > > > + if (!sgt->nents) { > > > + dev_err(db_attach->dev, "failed to map scatterlist\n"); > > > + mutex_unlock(lock); > > > + return ERR_PTR(-EIO); > > > + } > > > + > > > + attach->dma_dir = dma_dir; > > > + > > > + mutex_unlock(lock); > > > + > > > + return sgt; > > > +} > > > + > > > +static void dev_dmabuf_ops_unmap(struct dma_buf_attachment *db_attach, > > > + struct sg_table *sgt, > > > + enum dma_data_direction dma_dir) > > > +{ > > > + /* nothing to be done here */ > > > +} > > > + > > > +static int dev_dmabuf_ops_attach(struct dma_buf *dmabuf, > > > + struct dma_buf_attachment *dbuf_attach) > > > +{ > > > + struct dev_dmabuf_attachment *attach; > > > + unsigned int i; > > > + struct scatterlist *rd, *wr; > > > + struct sg_table *sgt; > > > + struct dev_dmabuf *buf = dmabuf->priv; > > > + int ret; > > > + > > > + attach = kzalloc(sizeof(*attach), GFP_KERNEL); > > > + if (!attach) > > > + return -ENOMEM; > > > + sgt = &attach->sgt; > > > + > > > + /* > > > + * Copy the buf->sgt scatter list to the attachment, as we can't > > > + * map the same scatter list to multiple attachments at the same time. > > > + */ > > > + ret = sg_alloc_table(sgt, buf->sgt.orig_nents, GFP_KERNEL); > > > + if (ret) { > > > + kfree(attach); > > > + return -ENOMEM; > > > + } > > > + > > > + rd = buf->sgt.sgl; > > > + wr = sgt->sgl; > > > + for (i = 0; i < sgt->orig_nents; ++i) { > > > + sg_set_page(wr, sg_page(rd), rd->length, rd->offset); > > > + rd = sg_next(rd); > > > + wr = sg_next(wr); > > > + } > > > + > > > + attach->dma_dir = DMA_NONE; > > > + dbuf_attach->priv = attach; > > > + > > > + return 0; > > > +} > > > + > > > +static void dev_dmabuf_ops_detach(struct dma_buf *dmabuf, > > > + struct dma_buf_attachment *db_attach) > > > +{ > > > + struct dev_dmabuf_attachment *attach = db_attach->priv; > > > + struct sg_table *sgt; > > > + > > > + if (!attach) > > > + return; > > > + sgt = &attach->sgt; > > > + > > > + /* release the scatterlist cache */ > > > + if (attach->dma_dir != DMA_NONE) > > > + /* > > > + * Cache sync can be skipped here, as the memory is > > > + * allocated from device coherent memory, which means the > > > + * memory locations do not require any explicit cache > > > + * maintenance prior or after being used by the device. > > > + * > > > + * XXX: This needs a revisit. > > > + */ > > > + dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents, > > > + attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC); > > > + sg_free_table(sgt); > > > + kfree(attach); > > > + db_attach->priv = NULL; > > > +} > > > + > > > + > > > +static void *dev_dmabuf_ops_vmap(struct dma_buf *dmabuf) > > > +{ > > > + struct dev_dmabuf *buf = dmabuf->priv; > > > + > > > + return buf->vaddr; > > > +} > > > + > > > +static void dev_dmabuf_ops_release(struct dma_buf *dmabuf) > > > +{ > > > + struct dev_dmabuf *buf = dmabuf->priv; > > > + > > > + sg_free_table(&buf->sgt); > > > + dma_free_attrs(buf->dev, buf->size, buf->vaddr, > > > + buf->dma_addr, buf->attrs); > > > + put_device(buf->dev); > > > + kfree(buf); > > > +} > > > + > > > +static int dev_dmabuf_ops_mmap(struct dma_buf *dmabuf, > > > + struct vm_area_struct *vma) > > > +{ > > > + struct dev_dmabuf *buf = dmabuf->priv; > > > + int ret; > > > + > > > + ret = dma_mmap_attrs(buf->dev, vma, buf->vaddr, > > > + buf->dma_addr, buf->size, > > > + buf->attrs); > > > + if (ret) { > > > + dev_err(buf->dev, "remapping memory failed, error: %d\n", ret); > > > + return ret; > > > + } > > > + vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP; > > > + > > > + return 0; > > > +} > > > + > > > +static const struct dma_buf_ops dev_dmabuf_ops = { > > > + .attach = dev_dmabuf_ops_attach, > > > + .detach = dev_dmabuf_ops_detach, > > > + .map_dma_buf = dev_dmabuf_ops_map, > > > + .unmap_dma_buf = dev_dmabuf_ops_unmap, > > > + .vmap = dev_dmabuf_ops_vmap, > > > + .mmap = dev_dmabuf_ops_mmap, > > > + .release = dev_dmabuf_ops_release, > > > +}; > > > + > > > +static int dev_heap_allocate(struct dma_heap *heap, > > > + unsigned long size, > > > + unsigned long fd_flags, > > > + unsigned long heap_flags) > > > +{ > > > + struct device *dev = dma_heap_get_drvdata(heap); > > > + struct dev_dmabuf *buf; > > > + struct dma_buf_export_info exp_info = {}; > > > + unsigned long attrs = 0; > > > + int ret = -ENOMEM; > > > + > > > + buf = kzalloc(sizeof(*buf), GFP_KERNEL); > > > + if (!buf) > > > + return -ENOMEM; > > > + > > > + buf->vaddr = dma_alloc_attrs(dev, size, &buf->dma_addr, > > > + GFP_KERNEL, attrs); > > > + /* Prevent the device from being released while the buffer is used */ > > > + buf->dev = get_device(dev); > > > + buf->heap = heap; > > > + buf->size = size; > > > + buf->attrs = attrs; > > > + > > > + /* XXX: This call is documented as unsafe. See dma_get_sgtable_attrs(). */ > > > + ret = dma_get_sgtable_attrs(buf->dev, &buf->sgt, > > > + buf->vaddr, buf->dma_addr, > > > + buf->size, buf->attrs); > > > + if (ret < 0) { > > > + dev_err(buf->dev, "failed to get scatterlist from DMA API\n"); > > > + return ret; > > > + } > > > + > > > + exp_info.exp_name = dev_name(dev); > > > + exp_info.owner = THIS_MODULE; > > > + exp_info.ops = &dev_dmabuf_ops; > > > + exp_info.size = size; > > > + exp_info.flags = fd_flags; > > > + exp_info.priv = buf; > > > + > > > + buf->dmabuf = dma_buf_export(&exp_info); > > > + if (IS_ERR(buf->dmabuf)) { > > > + dev_err(buf->dev, "failed to export dmabuf\n"); > > > + return PTR_ERR(buf->dmabuf); > > > + } > > > + > > > + ret = dma_buf_fd(buf->dmabuf, fd_flags); > > > + if (ret < 0) { > > > + dev_err(buf->dev, "failed to get dmabuf fd: %d\n", ret); > > > + return ret; > > > + } > > > + > > > + return ret; > > > +} > > > + > > > +static const struct dma_heap_ops dev_heap_ops = { > > > + .allocate = dev_heap_allocate, > > > +}; > > > + > > > +void dev_dma_heap_add(struct device *dev) > > > +{ > > > + struct dma_heap_export_info exp_info; > > > + > > > + exp_info.name = dev_name(dev); > > > + exp_info.ops = &dev_heap_ops; > > > + exp_info.priv = dev; > > > + > > > + dev->heap = dma_heap_add(&exp_info); > > > +} > > > +EXPORT_SYMBOL(dev_dma_heap_add); > > > diff --git a/include/linux/device.h b/include/linux/device.h > > > index ca18da4768e3..1fae95d55ea1 100644 > > > --- a/include/linux/device.h > > > +++ b/include/linux/device.h > > > @@ -45,6 +45,7 @@ struct iommu_ops; > > > struct iommu_group; > > > struct dev_pin_info; > > > struct dev_iommu; > > > +struct dma_heap; > > > > > > /** > > > * struct subsys_interface - interfaces to device functions > > > @@ -597,6 +598,10 @@ struct device { > > > struct iommu_group *iommu_group; > > > struct dev_iommu *iommu; > > > > > > +#ifdef CONFIG_DMABUF_HEAPS_DEVICES > > > + struct dma_heap *heap; > > > +#endif > > > + > > > bool offline_disabled:1; > > > bool offline:1; > > > bool of_node_reused:1; > > > diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h > > > index 454e354d1ffb..dcf7cca2f487 100644 > > > --- a/include/linux/dma-heap.h > > > +++ b/include/linux/dma-heap.h > > > @@ -56,4 +56,10 @@ void *dma_heap_get_drvdata(struct dma_heap *heap); > > > */ > > > struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info); > > > > > > +#ifdef CONFIG_DMABUF_HEAPS_DEVICES > > > +void dev_dma_heap_add(struct device *dev); > > > +#else > > > +static inline void dev_dma_heap_add(struct device *dev) {} > > > +#endif > > > + > > > #endif /* _DMA_HEAPS_H */ > > -- > Regards, > > Laurent Pinchart -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch