Hi :-) On Tue, Jan 15, 2019 at 12:40:16PM -0600, Andrew F. Davis wrote: > On 1/15/19 12:38 PM, Andrew F. Davis wrote: > > On 1/15/19 11:45 AM, Liam Mark wrote: > >> On Tue, 15 Jan 2019, Andrew F. Davis wrote: > >> > >>> On 1/14/19 11:13 AM, Liam Mark wrote: > >>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote: > >>>> > >>>>> Buffers may not be mapped from the CPU so skip cache maintenance here. > >>>>> Accesses from the CPU to a cached heap should be bracketed with > >>>>> {begin,end}_cpu_access calls so maintenance should not be needed anyway. > >>>>> > >>>>> Signed-off-by: Andrew F. Davis <afd@xxxxxx> > >>>>> --- > >>>>> drivers/staging/android/ion/ion.c | 7 ++++--- > >>>>> 1 file changed, 4 insertions(+), 3 deletions(-) > >>>>> > >>>>> diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c > >>>>> index 14e48f6eb734..09cb5a8e2b09 100644 > >>>>> --- a/drivers/staging/android/ion/ion.c > >>>>> +++ b/drivers/staging/android/ion/ion.c > >>>>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment, > >>>>> > >>>>> table = a->table; > >>>>> > >>>>> - if (!dma_map_sg(attachment->dev, table->sgl, table->nents, > >>>>> - direction)) > >>>>> + if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents, > >>>>> + direction, DMA_ATTR_SKIP_CPU_SYNC)) > >>>> > >>>> Unfortunately I don't think you can do this for a couple reasons. > >>>> You can't rely on {begin,end}_cpu_access calls to do cache maintenance. > >>>> If the calls to {begin,end}_cpu_access were made before the call to > >>>> dma_buf_attach then there won't have been a device attached so the calls > >>>> to {begin,end}_cpu_access won't have done any cache maintenance. > >>>> > >>> > >>> That should be okay though, if you have no attachments (or all > >>> attachments are IO-coherent) then there is no need for cache > >>> maintenance. Unless you mean a sequence where a non-io-coherent device > >>> is attached later after data has already been written. Does that > >>> sequence need supporting? > >> > >> Yes, but also I think there are cases where CPU access can happen before > >> in Android, but I will focus on later for now. > >> > >>> DMA-BUF doesn't have to allocate the backing > >>> memory until map_dma_buf() time, and that should only happen after all > >>> the devices have attached so it can know where to put the buffer. So we > >>> shouldn't expect any CPU access to buffers before all the devices are > >>> attached and mapped, right? > >>> > >> > >> Here is an example where CPU access can happen later in Android. > >> > >> Camera device records video -> software post processing -> video device > >> (who does compression of raw data) and writes to a file > >> > >> In this example assume the buffer is cached and the devices are not > >> IO-coherent (quite common). > >> > > > > This is the start of the problem, having cached mappings of memory that > > is also being accessed non-coherently is going to cause issues one way > > or another. On top of the speculative cache fills that have to be > > constantly fought back against with CMOs like below; some coherent > > interconnects behave badly when you mix coherent and non-coherent access > > (snoop filters get messed up). > > > > The solution is to either always have the addresses marked non-coherent > > (like device memory, no-map carveouts), or if you really want to use > > regular system memory allocated at runtime, then all cached mappings of > > it need to be dropped, even the kernel logical address (area as painful > > as that would be). Ouch :-( I wasn't aware about these potential interconnect issues. How "real" is that? It seems that we aren't really hitting that today on real devices. > > > >> ION buffer is allocated. > >> > >> //Camera device records video > >> dma_buf_attach > >> dma_map_attachment (buffer needs to be cleaned) > > > > Why does the buffer need to be cleaned here? I just got through reading > > the thread linked by Laura in the other reply. I do like +Brian's > > Actually +Brian this time :) > > > suggestion of tracking if the buffer has had CPU access since the last > > time and only flushing the cache if it has. As unmapped heaps never get > > CPU mapped this would never be the case for unmapped heaps, it solves my > > problem. > > > >> [camera device writes to buffer] > >> dma_buf_unmap_attachment (buffer needs to be invalidated) > > > > It doesn't know there will be any further CPU access, it could get freed > > after this for all we know, the invalidate can be saved until the CPU > > requests access again. We don't have any API to allow the invalidate to happen on CPU access if all devices already detached. We need a struct device pointer to give to the DMA API, otherwise on arm64 there'll be no invalidate. I had a chat with a few people internally after the previous discussion with Liam. One suggestion was to use DMA_ATTR_SKIP_CPU_SYNC in unmap_dma_buf, but only if there's at least one other device attached (guarantees that we can do an invalidate in the future if begin_cpu_access is called). If the last device detaches, do a sync then. Conversely, in map_dma_buf, we would track if there was any CPU access and use/skip the sync appropriately. I did start poking the code to check out how that would look, but then Christmas happened and I'm still catching back up. > > > >> dma_buf_detach (device cannot stay attached because it is being sent down > >> the pipeline and Camera doesn't know the end of the use case) > >> > > > > This seems like a broken use-case, I understand the desire to keep > > everything as modular as possible and separate the steps, but at this > > point no one owns this buffers backing memory, not the CPU or any > > device. I would go as far as to say DMA-BUF should be free now to > > de-allocate the backing storage if it wants, that way it could get ready > > for the next attachment, which may change the required backing memory > > completely. > > > > All devices should attach before the first mapping, and only let go > > after the task is complete, otherwise this buffers data needs copied off > > to a different location or the CPU needs to take ownership in-between. > > Yeah.. that's certainly the theory. Are there any DMA-BUF implementations which actually do that? I hear it quoted a lot, because that's what the docs say - but if the reality doesn't match it, maybe we should change the docs. > >> //buffer is send down the pipeline > >> > >> // Usersapce software post processing occurs > >> mmap buffer > > > > Perhaps the invalidate should happen here in mmap. > > > >> DMA_BUF_IOCTL_SYNC IOCT with flags DMA_BUF_SYNC_START // No CMO since no > >> devices attached to buffer > > > > And that should be okay, mmap does the sync, and if no devices are > > attached nothing could have changed the underlying memory in the > > mean-time, DMA_BUF_SYNC_START can safely be a no-op as they are. Yeah, that's true - so long as you did an invalidate in unmap_dma_buf. Liam was saying that it's too painful for them to do that every time a device unmaps - when in many cases (device->device, no CPU) it's not needed. > > > >> [CPU reads/writes to the buffer] > >> DMA_BUF_IOCTL_SYNC IOCTL with flags DMA_BUF_SYNC_END // No CMO since no > >> devices attached to buffer > >> munmap buffer > >> > >> //buffer is send down the pipeline > >> // Buffer is send to video device (who does compression of raw data) and > >> writes to a file > >> dma_buf_attach > >> dma_map_attachment (buffer needs to be cleaned) > >> [video device writes to buffer] > >> dma_buf_unmap_attachment > >> dma_buf_detach (device cannot stay attached because it is being sent down > >> the pipeline and Video doesn't know the end of the use case) > >> > >> > >> > >>>> Also ION no longer provides DMA ready memory, so if you are not doing CPU > >>>> access then there is no requirement (that I am aware of) for you to call > >>>> {begin,end}_cpu_access before passing the buffer to the device and if this > >>>> buffer is cached and your device is not IO-coherent then the cache maintenance > >>>> in ion_map_dma_buf and ion_unmap_dma_buf is required. > >>>> > >>> > >>> If I am not doing any CPU access then why do I need CPU cache > >>> maintenance on the buffer? > >>> > >> > >> Because ION no longer provides DMA ready memory. > >> Take the above example. > >> > >> ION allocates memory from buddy allocator and requests zeroing. > >> Zeros are written to the cache. > >> > >> You pass the buffer to the camera device which is not IO-coherent. > >> The camera devices writes directly to the buffer in DDR. > >> Since you didn't clean the buffer a dirty cache line (one of the zeros) is > >> evicted from the cache, this zero overwrites data the camera device has > >> written which corrupts your data. > >> > > > > The zeroing *is* a CPU access, therefor it should handle the needed CMO > > for CPU access at the time of zeroing. > > Actually that should be at the point of the first non-coherent device mapping the buffer right? No point in doing CMO if the future accesses are coherent. Cheers, -Brian > > Andrew > > > >> Liam > >> > >> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > >> a Linux Foundation Collaborative Project > >> _______________________________________________ devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxx http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel