On Thu, Jul 6, 2017 at 10:31 PM, Tomasz Figa <tfiga@xxxxxxxxxxxx> wrote: > On Thu, Jul 6, 2017 at 9:23 PM, Arnd Bergmann <arnd@xxxxxxxx> wrote: >> On Thu, Jul 6, 2017 at 10:36 AM, Tomasz Figa <tfiga@xxxxxxxxxxxx> wrote: >>> On Thu, Jul 6, 2017 at 5:34 PM, Tomasz Figa <tfiga@xxxxxxxxxxxx> wrote: >>>> On Thu, Jul 6, 2017 at 5:26 PM, Arnd Bergmann <arnd@xxxxxxxx> wrote: >>>>> On Thu, Jul 6, 2017 at 3:44 AM, Tomasz Figa <tfiga@xxxxxxxxxxxx> wrote: >> >>>> >>>> I'd say that this is something that has been consistently tried to be >>>> avoided by V4L2 and that's why it's so tightly integrated with DMA >>>> mapping. IMHO re-implementing the code that's already there in >>>> videobuf2 again in the driver, only because, for no good reason >>>> mentioned as for now, having a loadable module providing DMA ops was >>>> disliked. >>> >>> Sorry, I intended to mean: >>> >>> IMHO re-implementing the code that's already there in videobuf2 again >>> in the driver, only because, for no good reason mentioned as for now, >>> having a loadable module providing DMA ops was disliked, would make no >>> sense. >> >> Why would we need to duplicate that code? I would expect that the videobuf2 >> core can simply call the regular dma_mapping interfaces, and you handle the >> IOPTE generation at the point when the buffer is handed off from the core >> code to the device driver. Am I missing something? > > Well, for example, the iommu-dma helpers already implement all the > IOVA management, SG iterations, IOMMU API calls, sanity checks and so > on. There is a significant amount of common code. > > On the other hand, if it's strictly about base/dma-mapping, we might > not need it indeed. The driver could call iommu-dma helpers directly, > without the need to provide its own DMA ops. One caveat, though, we > are not able to obtain coherent (i.e. uncached) memory with this > approach, which might have some performance effects and complicates > the code, that would now need to flush caches even for some small > internal buffers. I think I should add a bit of explanation here: 1) the device is non-coherent with CPU caches, even on x86, 2) it looks like x86 does not have non-coherent DMA ops, (but it might be something that could be fixed) 3) one technically could still use __get_vm_area() and map_vm_area(), which _are_ exported, to create an uncached mapping. I'll leave it to you to judge if it would be better than using the already available generic helpers. Best regards, Tomasz