On Wed, May 8, 2019 at 2:23 PM Koenig, Christian <Christian.Koenig@xxxxxxx> wrote: > > Am 08.05.19 um 14:15 schrieb Daniel Vetter: > > [CAUTION: External Email] > > > > On Wed, May 8, 2019 at 2:00 PM Christian König > > <ckoenig.leichtzumerken@xxxxxxxxx> wrote: > >> Am 08.05.19 um 12:11 schrieb Daniel Vetter: > >>> On Wed, May 08, 2019 at 11:15:33AM +0200, Daniel Vetter wrote: > >>>> On Tue, May 07, 2019 at 10:13:30AM +0200, Christian König wrote: > >>>>> To allow a smooth transition from pinning buffer objects to dynamic > >>>>> invalidation we first start to cache the sg_table for an attachment. > >>>>> > >>>>> Signed-off-by: Christian König <christian.koenig@xxxxxxx> > >>>>> --- > >>>>> drivers/dma-buf/dma-buf.c | 24 ++++++++++++++++++++++++ > >>>>> include/linux/dma-buf.h | 14 ++++++++++++++ > >>>>> 2 files changed, 38 insertions(+) > >>>>> > >>>>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c > >>>>> index 7c858020d14b..775e13f54083 100644 > >>>>> --- a/drivers/dma-buf/dma-buf.c > >>>>> +++ b/drivers/dma-buf/dma-buf.c > >>>>> @@ -573,6 +573,20 @@ struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf, > >>>>> list_add(&attach->node, &dmabuf->attachments); > >>>>> > >>>>> mutex_unlock(&dmabuf->lock); > >>>>> + > >>>>> + if (!dma_buf_is_dynamic(dmabuf)) { > >>>>> + struct sg_table *sgt; > >>>>> + > >>>>> + sgt = dmabuf->ops->map_dma_buf(attach, DMA_BIDIRECTIONAL); > >>>>> + if (!sgt) > >>>>> + sgt = ERR_PTR(-ENOMEM); > >>>>> + if (IS_ERR(sgt)) { > >>>>> + dma_buf_detach(dmabuf, attach); > >>>>> + return ERR_CAST(sgt); > >>>>> + } > >>>>> + attach->sgt = sgt; > >>>>> + } > >>>>> + > >>>>> return attach; > >>>>> > >>>>> err_attach: > >>>>> @@ -595,6 +609,10 @@ void dma_buf_detach(struct dma_buf *dmabuf, struct dma_buf_attachment *attach) > >>>>> if (WARN_ON(!dmabuf || !attach)) > >>>>> return; > >>>>> > >>>>> + if (attach->sgt) > >>>>> + dmabuf->ops->unmap_dma_buf(attach, attach->sgt, > >>>>> + DMA_BIDIRECTIONAL); > >>>>> + > >>>>> mutex_lock(&dmabuf->lock); > >>>>> list_del(&attach->node); > >>>>> if (dmabuf->ops->detach) > >>>>> @@ -630,6 +648,9 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach, > >>>>> if (WARN_ON(!attach || !attach->dmabuf)) > >>>>> return ERR_PTR(-EINVAL); > >>>>> > >>>>> + if (attach->sgt) > >>>>> + return attach->sgt; > >>>>> + > >>>>> sg_table = attach->dmabuf->ops->map_dma_buf(attach, direction); > >>>>> if (!sg_table) > >>>>> sg_table = ERR_PTR(-ENOMEM); > >>>>> @@ -657,6 +678,9 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment *attach, > >>>>> if (WARN_ON(!attach || !attach->dmabuf || !sg_table)) > >>>>> return; > >>>>> > >>>>> + if (attach->sgt == sg_table) > >>>>> + return; > >>>>> + > >>>>> attach->dmabuf->ops->unmap_dma_buf(attach, sg_table, > >>>>> direction); > >>>>> } > >>>>> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h > >>>>> index 58725f890b5b..52031fdc75bb 100644 > >>>>> --- a/include/linux/dma-buf.h > >>>>> +++ b/include/linux/dma-buf.h > >>>>> @@ -322,6 +322,7 @@ struct dma_buf_attachment { > >>>>> struct dma_buf *dmabuf; > >>>>> struct device *dev; > >>>>> struct list_head node; > >>>>> + struct sg_table *sgt; > >>>>> void *priv; > >>>>> }; > >>>>> > >>>>> @@ -373,6 +374,19 @@ static inline void get_dma_buf(struct dma_buf *dmabuf) > >>>>> get_file(dmabuf->file); > >>>>> } > >>>>> > >>>>> +/** > >>>>> + * dma_buf_is_dynamic - check if a DMA-buf uses dynamic mappings. > >>>>> + * @dmabuf: the DMA-buf to check > >>>>> + * > >>>>> + * Returns true if a DMA-buf exporter wants to create dynamic sg table mappings > >>>>> + * for each attachment. False if only a single static sg table should be used. > >>>>> + */ > >>>>> +static inline bool dma_buf_is_dynamic(struct dma_buf *dmabuf) > >>>>> +{ > >>>>> + /* Always use a static mapping for now */ > >>>>> + return false; > >>>> Hm I still expect that later on we'll want this to be decided by the > >>>> attachment: It's only dynamic if both the exporter and the importer > >>>> support dynamic dma-buf management, otherwise we need to pin. > >>>> > >>>> But anyway, I feel like we need to go over the entire thing anyway once > >>>> more when p2p has landed, on this: > >>>> > >>>> Reviewed-by: Daniel Vetter <daniel.vetter@xxxxxxxx> > >>> Correction that I only just realized, but need to retract that r-b on this > >>> and the drm sgt cache removal patch: You now hardcode the direction to > >>> DMA_BIDIRECTIONAL, drm_prime did only keep the cache for a given > >>> direction. > >>> > >>> Now x86 drivers always set DMA_BIDIRECTIONAL, but arm soc drivers (and > >>> also v4l videobuf layer) try to guess whether it should be DMA_TO_DEVICE > >>> or DMA_FROM_DEVICE. I have no idea what the implications are, also all the > >>> cache coherency on dma-bufs is kinda ill-defined. > >>> > >>> And we can't throw the sgt cache away at map time if it doesn't fit like > >>> drm_prime does, because that reintroduces the reservation object lock, > >>> defeating the entire purpose of this. Also we can't just assume drm_prime > >>> works for everyone, since the special cases all roll their own dma-buf > >>> import. I have also not checked what exactly exporters do. No idea what to > >>> do here now. > >>> > >>> /me cries > >> Well it is kind of abusing to use the DMA direction here in the first > >> place and I actually considered to completely remove it. > >> > >> The key problem is that the DMA direction defines things from the view > >> point of the CPU, e.g. DMA_TO_DEVICE means the data has been accessed by > >> the CPU and we are now pushing it to a device. > >> > >> But DMA-buf is essentially about device to device transfers, so > >> DMA_TO_DEVICE as well as DMA_FROM_DEVICE is completely ambiguous because > >> you don't know the point of view to start with. > >> > >> Additional to that the caching implementation in drm_prime.c didn't > >> handled it full either: > >>> - /* > >>> - * two mappings with different directions for the same attachment are > >>> - * not allowed > >>> - */ > >>> - if (WARN_ON(prime_attach->dir != DMA_NONE)) > >>> - return ERR_PTR(-EBUSY); > >> Do the ARM guys and V4L guys actually do anything with the direction > >> except for passing it on dma_map_sg_attrs() ? > > I don't think so, but that alone is a big thing because on arm, that > > actually changes what happens wrt flushing and stuff. On x86 the dma > > subsystem assumes everything is always coherent, and we have a pile of > > hacks in drivers to make sure the dma access is coherent enough (since > > on a gpu this is controlled on a per transaction basis usually, fairly > > often even under userspace's control). > > > > I have tried to better understand what a better dma api interface > > would look like for gpus and dma-buf, but every time that comes up the > > dma maintainers tell me I'm a fool and that this _must_ be abstracted > > away by the dma api and I give up again. Hence the crying and no idea > > what to do here :-/ > > Well count me to the people who think that the DMA-API on Linux is not a > good fit for GPUs. > > E.g. I've tried to explain multiple times now that we rather want a > failed mapping instead of using bounce buffers, but that doesn't seem to > sink in. > > > In reality it probably doesn't matter, as long as no one tries to run > > a gpu driver with dynamic dma-buf export on arm, but I've seen some > > people trying to get amdgpu at least going in that combo. So not a > > long-term solution either. And we can't just not cache, because that > > breaks the world wrt locking ordering I think, or at least a bunch of > > drm drivers (and not all drm drivers use the drm_prime helpers, so we > > can't just do this for us and not cache for anyone else and hope that > > works out). > > Well the amdgpu user space stack absolutely needs coherent mappings or > otherwise won't work. So I'm not worried about amdgpu. > > But this doesn't mean that we can ignore it for ARM. > > > One option I haven't looked into much, but might work, is that we call > > the dma api coherency functions if the requested attachment_map/unmap > > direction doesn't match. But I honestly don't know whether that's > > allowed and would result in the same thing in the dma-api backends as > > doing a mapping with the desired direction right away. Also, it would > > terribly break the layering, so we'd need to add a map_sync callback > > (there's been patches, but they never landed) and roll them out to all > > exporters. > > Well no, not really. I think I will just go down the route of making sgt > caching as optional as possible. > > E.g. only create the cache mapping during attach if we actually have the > locking problem. Well that's the "review and update everyone, and oh welp is there a lot of them" approach. Which we tried to avoid ... > So back to coding for now, So if you want to go this way, what about resurrecting the dma_buf_attachment_info rework (I really think that's cleaner than shitloads of arguments, maybe keep the current dma_buf_attach as-is and do the info argument thing only for dma_buf_attach_ext() or similar), and have a ->need_caching flag in there? Then we can set that for drm_prime and i915 at least. Just an idea to avoid changing the world ... -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel