On Tue, 22 Jan 2019, Andrew F. Davis wrote: > On 1/21/19 4:12 PM, Liam Mark wrote: > > On Mon, 21 Jan 2019, Christoph Hellwig wrote: > > > >> On Mon, Jan 21, 2019 at 11:44:10AM -0800, Liam Mark wrote: > >>> The main use case is for allowing clients to pass in > >>> DMA_ATTR_SKIP_CPU_SYNC in order to skip the default cache maintenance > >>> which happens in dma_buf_map_attachment and dma_buf_unmap_attachment. In > >>> ION the buffers aren't usually accessed from the CPU so this allows > >>> clients to often avoid doing unnecessary cache maintenance. > >> > >> This can't work. The cpu can still easily speculate into this area. > > > > Can you provide more detail on your concern here. > > The use case I am thinking about here is a cached buffer which is accessed > > by a non IO-coherent device (quite a common use case for ION). > > > > Guessing on your concern: > > The speculative access can be an issue if you are going to access the > > buffer from the CPU after the device has written to it, however if you > > know you aren't going to do any CPU access before the buffer is again > > returned to the device then I don't think the speculative access is a > > concern. > > > >> Moreover in general these operations should be cheap if the addresses > >> aren't cached. > >> > > > > I am thinking of use cases with cached buffers here, so CMO isn't cheap. > > > > These buffers are cacheable, not cached, if you haven't written anything > the data wont actually be in cache. That's true > And in the case of speculative cache > filling the lines are marked clean. In either case the only cost is the > little 7 instruction loop calling the clean/invalidate instruction (dc > civac for ARMv8) for the cache-lines. Unless that is the cost you are > trying to avoid? > This is the cost I am trying to avoid and this comes back to our previous discussion. We have a coherent system cache so if you are doing this for every cache line on a large buffer it adds up with this work and the going to the bus. For example I believe 1080P buffers are 8MB, and 4K buffers are even larger. I also still think you would want to solve this properly such that invalidates aren't being done unnecessarily. > In that case if you are mapping and unmapping so much that the little > CMO here is hurting performance then I would argue your usage is broken > and needs to be re-worked a bit. > I am not sure I would say it is broken, the large buffers (example 1080P buffers) are mapped and unmapped on every frame. I don't think there is any clean way to avoid that in a pipelining framework, you could ask clients to keep the buffers dma mapped but there isn't necessarily a good time to tell them to unmap. It would be unfortunate to not consider this something legitimate for usespace to do in a pipelining use case. Requiring devices to stay attached doesn't seem very clean to me as there isn't necessarily a nice place to tell them when to detach. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project _______________________________________________ devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxx http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel