On Wed, May 22, 2019 at 7:28 PM Christian König <ckoenig.leichtzumerken@xxxxxxxxx> wrote: > > Am 22.05.19 um 18:17 schrieb Sumit Semwal: > > Hi Christian, > > > > On Sat, 27 Apr 2019 at 05:31, Liam Mark <lmark@xxxxxxxxxxxxxx> wrote: > >> On Tue, 16 Apr 2019, Christian König wrote: > >> > >>> To allow a smooth transition from pinning buffer objects to dynamic > >>> invalidation we first start to cache the sg_table for an attachment > >>> unless the driver explicitly says to not do so. > >>> > >>> --- > >>> drivers/dma-buf/dma-buf.c | 24 ++++++++++++++++++++++++ > >>> include/linux/dma-buf.h | 11 +++++++++++ > >>> 2 files changed, 35 insertions(+) > >>> > >>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c > >>> index 7c858020d14b..65161a82d4d5 100644 > >>> --- a/drivers/dma-buf/dma-buf.c > >>> +++ b/drivers/dma-buf/dma-buf.c > >>> @@ -573,6 +573,20 @@ struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf, > >>> list_add(&attach->node, &dmabuf->attachments); > >>> > >>> mutex_unlock(&dmabuf->lock); > >>> + > >>> + if (!dmabuf->ops->dynamic_sgt_mapping) { > >>> + struct sg_table *sgt; > >>> + > >>> + sgt = dmabuf->ops->map_dma_buf(attach, DMA_BIDIRECTIONAL); > >>> + if (!sgt) > >>> + sgt = ERR_PTR(-ENOMEM); > >>> + if (IS_ERR(sgt)) { > >>> + dma_buf_detach(dmabuf, attach); > >>> + return ERR_CAST(sgt); > >>> + } > >>> + attach->sgt = sgt; > >>> + } > >>> + > >>> return attach; > >>> > >>> err_attach: > >>> @@ -595,6 +609,10 @@ void dma_buf_detach(struct dma_buf *dmabuf, struct dma_buf_attachment *attach) > >>> if (WARN_ON(!dmabuf || !attach)) > >>> return; > >>> > >>> + if (attach->sgt) > >>> + dmabuf->ops->unmap_dma_buf(attach, attach->sgt, > >>> + DMA_BIDIRECTIONAL); > >>> + > >>> mutex_lock(&dmabuf->lock); > >>> list_del(&attach->node); > >>> if (dmabuf->ops->detach) > >>> @@ -630,6 +648,9 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach, > >>> if (WARN_ON(!attach || !attach->dmabuf)) > >>> return ERR_PTR(-EINVAL); > >>> > >>> + if (attach->sgt) > >>> + return attach->sgt; > >>> + > >> I am concerned by this change to make caching the sg_table the default > >> behavior as this will result in the exporter's map_dma_buf/unmap_dma_buf > >> calls are no longer being called in > >> dma_buf_map_attachment/dma_buf_unmap_attachment. > > Probably this concern from Liam got lost between versions of your > > patches; could we please request a reply to these points here? > > Sorry I indeed never got this mail, but this is actually not an issue > because Daniel had similar concerns and we didn't made this the default > in the final version. > > >> This seems concerning to me as it appears to ignore the cache maintenance > >> aspect of the map_dma_buf/unmap_dma_buf calls. > >> For example won't this potentially cause issues for clients of ION. > >> > >> If we had the following > >> - #1 dma_buf_attach coherent_device > >> - #2 dma_buf attach non_coherent_device > >> - #3 dma_buf_map_attachment non_coherent_device > >> - #4 non_coherent_device writes to buffer > >> - #5 dma_buf_unmap_attachment non_coherent_device > >> - #6 dma_buf_map_attachment coherent_device > >> - #7 coherent_device reads buffer > >> - #8 dma_buf_unmap_attachment coherent_device > >> > >> There wouldn't be any CMO at step #5 anymore (specifically no invalidate) > >> so now at step #7 the coherent_device could read a stale cache line. > >> > >> Also, now by default dma_buf_unmap_attachment no longer removes the > >> mappings from the iommu, so now by default dma_buf_unmap_attachment is not > >> doing what I would expect and clients are losing the potential sandboxing > >> benefits of removing the mappings. > >> Shouldn't this caching behavior be something that clients opt into instead > >> of being the default? > > Well, it seems you are making incorrect assumptions about the cache > maintenance of DMA-buf here. > > At least for all DRM devices I'm aware of mapping/unmapping an > attachment does *NOT* have any cache maintenance implications. > > E.g. the use case you describe above would certainly fail with amdgpu, > radeon, nouveau and i915 because mapping a DMA-buf doesn't stop the > exporter from reading/writing to that buffer (just the opposite actually). > > All of them assume perfectly coherent access to the underlying memory. > As far as I know there is no documented cache maintenance requirements > for DMA-buf. I think it is documented. It's just that on x86, we ignore that because the dma-api pretends there's never a need for cache flushing on x86, and that everything snoops the cpu caches. Which isn't true since over 20 ago when AGP happened. The actual rules for x86 dma-buf are very much ad-hoc (and we occasionally reapply some duct-tape when cacheline noise shows up somewhere). I've just filed this away as another instance of the dma-api not fitting gpus, and I think giving recent discussions that won't improve anytime soon. So we're stuck with essentially undefined dma-buf behaviour. -Daniel > The IOMMU concern on the other hand is certainly valid and I perfectly > agree that keeping the mapping time as short as possible is desirable. > > Regards, > Christian. > > >> Liam > >> > >> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > >> a Linux Foundation Collaborative Project > >> > > Best, > > Sumit. > > _______________________________________________ > dri-devel mailing list > dri-devel@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch