2017-11-09 18:14 GMT+08:00 Arnd Bergmann <arnd@xxxxxxxx>: > On Thu, Nov 9, 2017 at 8:12 AM, Greentime Hu <green.hu@xxxxxxxxx> wrote: >> 2017-11-08 17:09 GMT+08:00 Arnd Bergmann <arnd@xxxxxxxx>: >>> On Wed, Nov 8, 2017 at 6:55 AM, Greentime Hu <green.hu@xxxxxxxxx> wrote: >>> > >>> You do the same cache operations for _to_cpu and _to_device, which >>> usually works, >>> but is more expensive than you need. It's better to take the ownership into >>> account and only do what you need. >>> >> Like this? >> >> static void >> nds32_dma_sync_single_for_cpu(struct device *dev, dma_addr_t handle, >> size_t size, enum dma_data_direction dir) >> { >> consistent_sync((void *)dma_to_virt(dev, handle), size, >> DMA_FROM_DEVICE); >> } >> >> static void >> nds32_dma_sync_single_for_device(struct device *dev, dma_addr_t handle, >> size_t size, enum dma_data_direction dir) >> { >> consistent_sync((void *)dma_to_virt(dev, handle), size, >> DMA_TO_DEVICE); >> } > > No, it's more complicated than that. You need to pass both the direction of the > DMA transaction and the ownership to consistent_sync(), and then do the > correct cache maintenance operation for each of the six combinations. > > Which operation that is depends on the microarchitecture to some degree, > e.g. on machines that can load arbitrary cache lines during speculative > execution, you have to invalidate the caches during both > _for_device/FROM_DEVICE _for_cpu/FROM_DEVICE, while machines > without speculative execution can skip the second invalidation, they > only need to get rid of dirty cache lines before the DMA from device. > > Usually you don't have to do a writeback during _for_cpu, since there > are no dirty cache lines after the _for_device operation. > > It's not entirely clear what the correct behavior is for buffers that > are not cache line aligned, some architectures use wbinval instead > of inval for the _for_device/_FROM_DEVICE operation, on > any partial cache line, but you wouldn't want to do that on the > _for_cpu/_FROM_DEVICE operation. I get your point. I prefer to keep it that way because it will be a little bit complex. I will still study the code to see what I can improve in the next version patch.