On Tue, Sep 17, 2019 at 4:47 PM Koenig, Christian <Christian.Koenig@xxxxxxx> wrote: > > Am 17.09.19 um 15:45 schrieb Daniel Vetter: > > On Tue, Sep 17, 2019 at 01:24:10PM +0000, Koenig, Christian wrote: > >> Am 17.09.19 um 15:13 schrieb Daniel Vetter: > >>> On Tue, Sep 17, 2019 at 12:40:51PM +0000, Koenig, Christian wrote: > >>>> Am 17.09.19 um 14:31 schrieb Daniel Vetter: > >>>>> On Mon, Sep 16, 2019 at 02:23:13PM +0200, Christian König wrote: > >>>>>> Ping? Any further comment on this or can't we merge at least the locking > >>>>>> change? > >>>>> I was at plumbers ... > >>>>>> Christian. > >>>>>> > >>>>>> Am 11.09.19 um 12:53 schrieb Christian König: > >>>>>>> Am 03.09.19 um 10:05 schrieb Daniel Vetter: > >>>>>>>> On Thu, Aug 29, 2019 at 04:29:14PM +0200, Christian König wrote: > >>>>>>>>> This patch is a stripped down version of the locking changes > >>>>>>>>> necessary to support dynamic DMA-buf handling. > >>>>>>>>> > >>>>>>>>> For compatibility we cache the DMA-buf mapping as soon as > >>>>>>>>> exporter/importer disagree on the dynamic handling. > >>>>>>>>> > >>>>>>>>> Signed-off-by: Christian König <christian.koenig@xxxxxxx> > >>>>>>>>> --- > >>>>>>>>> drivers/dma-buf/dma-buf.c | 90 > >>>>>>>>> ++++++++++++++++++++++++++++++++++++--- > >>>>>>>>> include/linux/dma-buf.h | 51 +++++++++++++++++++++- > >>>>>>>>> 2 files changed, 133 insertions(+), 8 deletions(-) > >>>>>>>>> > >>>>>>>>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c > >>>>>>>>> index 433d91d710e4..65052d52602b 100644 > >>>>>>>>> --- a/drivers/dma-buf/dma-buf.c > >>>>>>>>> +++ b/drivers/dma-buf/dma-buf.c > >>>>>>>>> @@ -525,6 +525,10 @@ struct dma_buf *dma_buf_export(const struct > >>>>>>>>> dma_buf_export_info *exp_info) > >>>>>>>>> return ERR_PTR(-EINVAL); > >>>>>>>>> } > >>>>>>>>> + if (WARN_ON(exp_info->ops->cache_sgt_mapping && > >>>>>>>>> + exp_info->ops->dynamic_mapping)) > >>>>>>>>> + return ERR_PTR(-EINVAL); > >>>>>>>>> + > >>>>>>>>> if (!try_module_get(exp_info->owner)) > >>>>>>>>> return ERR_PTR(-ENOENT); > >>>>>>>>> @@ -645,10 +649,11 @@ void dma_buf_put(struct dma_buf *dmabuf) > >>>>>>>>> EXPORT_SYMBOL_GPL(dma_buf_put); > >>>>>>>>> /** > >>>>>>>>> - * dma_buf_attach - Add the device to dma_buf's attachments > >>>>>>>>> list; optionally, > >>>>>>>>> + * dma_buf_dynamic_attach - Add the device to dma_buf's > >>>>>>>>> attachments list; optionally, > >>>>>>>>> * calls attach() of dma_buf_ops to allow device-specific > >>>>>>>>> attach functionality > >>>>>>>>> - * @dmabuf: [in] buffer to attach device to. > >>>>>>>>> - * @dev: [in] device to be attached. > >>>>>>>>> + * @dmabuf: [in] buffer to attach device to. > >>>>>>>>> + * @dev: [in] device to be attached. > >>>>>>>>> + * @dynamic_mapping: [in] calling convention for map/unmap > >>>>>>>>> * > >>>>>>>>> * Returns struct dma_buf_attachment pointer for this > >>>>>>>>> attachment. Attachments > >>>>>>>>> * must be cleaned up by calling dma_buf_detach(). > >>>>>>>>> @@ -662,8 +667,9 @@ EXPORT_SYMBOL_GPL(dma_buf_put); > >>>>>>>>> * accessible to @dev, and cannot be moved to a more suitable > >>>>>>>>> place. This is > >>>>>>>>> * indicated with the error code -EBUSY. > >>>>>>>>> */ > >>>>>>>>> -struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf, > >>>>>>>>> - struct device *dev) > >>>>>>>>> +struct dma_buf_attachment * > >>>>>>>>> +dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct device *dev, > >>>>>>>>> + bool dynamic_mapping) > >>>>>>>>> { > >>>>>>>>> struct dma_buf_attachment *attach; > >>>>>>>>> int ret; > >>>>>>>>> @@ -677,6 +683,7 @@ struct dma_buf_attachment > >>>>>>>>> *dma_buf_attach(struct dma_buf *dmabuf, > >>>>>>>>> attach->dev = dev; > >>>>>>>>> attach->dmabuf = dmabuf; > >>>>>>>>> + attach->dynamic_mapping = dynamic_mapping; > >>>>>>>>> mutex_lock(&dmabuf->lock); > >>>>>>>>> @@ -685,16 +692,64 @@ struct dma_buf_attachment > >>>>>>>>> *dma_buf_attach(struct dma_buf *dmabuf, > >>>>>>>>> if (ret) > >>>>>>>>> goto err_attach; > >>>>>>>>> } > >>>>>>>>> + dma_resv_lock(dmabuf->resv, NULL); > >>>>>>>>> list_add(&attach->node, &dmabuf->attachments); > >>>>>>>>> + dma_resv_unlock(dmabuf->resv); > >>>>>>>>> mutex_unlock(&dmabuf->lock); > >>>>>>>>> + /* When either the importer or the exporter can't handle dynamic > >>>>>>>>> + * mappings we cache the mapping here to avoid issues with the > >>>>>>>>> + * reservation object lock. > >>>>>>>>> + */ > >>>>>>>>> + if (dma_buf_attachment_is_dynamic(attach) != > >>>>>>>>> + dma_buf_is_dynamic(dmabuf)) { > >>>>>>>>> + struct sg_table *sgt; > >>>>>>>>> + > >>>>>>>>> + if (dma_buf_is_dynamic(attach->dmabuf)) > >>>>>>>>> + dma_resv_lock(attach->dmabuf->resv, NULL); > >>>>>>>>> + > >>>>>>>>> + sgt = dmabuf->ops->map_dma_buf(attach, DMA_BIDIRECTIONAL); > >>>>>>>> Now we're back to enforcing DMA_BIDI, which works nicely around the > >>>>>>>> locking pain, but apparently upsets the arm-soc folks who want to > >>>>>>>> control > >>>>>>>> this better. > >>>>>>> Take another look at dma_buf_map_attachment(), we still try to get the > >>>>>>> caching there for ARM. > >>>>>>> > >>>>>>> What we do here is to bidirectionally map the buffer to avoid the > >>>>>>> locking hydra when importer and exporter disagree on locking. > >>>>>>> > >>>>>>> So the ARM folks can easily avoid that by switching to dynamic locking > >>>>>>> for both. > >>>>> So you still break the contract between importer and exporter, except not > >>>>> for anything that's run in intel-gfx-ci so all is good? > >>>> No, the contract between importer and exporter stays exactly the same it > >>>> is currently as long as you don't switch to dynamic dma-buf handling. > >>>> > >>>> There is no functional change for the ARM folks here. The only change > >>>> which takes effect is between i915 and amdgpu and that is perfectly > >>>> covered by intel-gfx-ci. > >>> There's people who want to run amdgpu on ARM? > >> Sure there are, we even recently fixed some bugs for this. > >> > >> But as far as I know there is no one currently which is affect by this > >> change on ARM with amdgpu. > > But don't you break them with this now? > > No, we see the bidirectional attachment as compatible with the other ones. > > > amdgpu will soon set the dynamic flag on exports, which forces the caching > > at create time (to avoid the locking fun), which will then result in a > > EBUSY at map_attachment time because we have a cached mapping, but it's > > the wrong type. > > See the check in dma_buf_map_attachment(): > > if (attach->dir != direction && attach->dir != DMA_BIDIRECTIONAL) > return ERR_PTR(-EBUSY); Hm, I misread this. So yeah should work, +/- the issue that we might not flush enough. But I guess that can be fixed whenever, it's not like dma-api semantics are a great fit for us. Maybe a fixme comment would be useful here ... I'll look at this tomorrow or so because atm brain is slow, I'm down with the usual post-conference cold it seems :-/ -Daniel > > Regards, > Christian. > > > > >>> Also, x86 doesn't have cache > >>> flushing in the dma-api, so naturally this isn't any issue for us (we > >>> still have cache flushing in actual hw, but that's a different topic). So > >>> "works on x86" isn't really a great way to justify what we do here I > >>> think. > >> Well it is the exact same caching we previously had as well, so there is > >> absolutely no functional change here except that we now explicitly note > >> that amdgpu always needs bidirectional mappings. > >> > >> I agree that we should get rid of this caching as soon as possible, but > >> we should not fix things which where broken before. > >> > >> On the other hand adding dma_sg_sync_for_cpu/device sounds like > >> something we could easily add separately to the caching if you think > >> that this will help. > > The current code maybe lacks some cache flushes, but we already require a > > fixed direction per attachment. So I guess not a real problem, probably. > > > > But with your patches I think we now fail with EBUSY. Not exactly nice ... > > -Daniel > > > >> Christian. > >> > >>> -Daniel > >>> > >>>> Regards, > >>>> Christian. > >>>> > >>>>> The other issue with "we solve this with caching the mapping": Currently > >>>>> map/unmap flush (at least on arm, at least on cases where it matters). If > >>>>> you just return the cached sg, then we don't do the flushing anymore, > >>>>> which might break importers/exporters in exactly the same way as just > >>>>> giving them the wrong mapping. There's zero differences between a BIDI, > >>>>> TO_CPU, or TO_DEVICE mapping, the only places where this matters is for > >>>>> cache flushing. > >>>>> > >>>>> So here's something that could actually work: > >>>>> - We cache the mapping. > >>>>> - We cache a bidirectional mapping. > >>>>> - We put the right dma_sg_sync_for_cpu/device calls in place for map/unmap > >>>>> to give current importers/exporters the same behaviour they're used to > >>>>> now. > >>>>> > >>>>> And yes the caching we've lifted might have broken something somewhere > >>>>> already. But generally you only hear about that long time after because > >>>>> arm vendors roll forward once every few years. Or something like that. > >>>>> -Daniel > >>>>> > >>>>>>> Regards, > >>>>>>> Christian. > >>>>>>> > >>>>>>>> That's why your previous version moved the caching into > >>>>>>>> map/unmap_sg, which resurrected the locking hydra. > >>>>>>>> > >>>>>>>> I think we're going a bit in circles here, and I don't have a good idea > >>>>>>>> either :-/ > >>>>>>>> -Daniel > >>>>>>>> > >>>>>>>> > -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch