On 24/06/2020 16.58, Thomas Ruf wrote: > >> On 24 June 2020 at 14:07 Peter Ujfalusi <peter.ujfalusi@xxxxxx> wrote: >> On 24/06/2020 12.38, Vinod Koul wrote: >>> On 24-06-20, 11:30, Thomas Ruf wrote: >>> >>>> To make it short - i have two questions: >>>> - what are the chances to revive DMA_SG? >>> >>> 100%, if we have a in-kernel user >> >> Most DMAs can not handle differently provisioned sg_list for src and dst. >> Even if they could handle non symmetric SG setup it requires entirely >> different setup (two independent channels sending the data to each >> other, one reads, the other writes?). > > Ok, i implemented that using zynqmp_dma on a Xilinx Zynq platform (obviously ;-) and it works nicely for us. I see, if the HW does not support it then something along the lines of what the atc_prep_dma_sg did can be implemented for most engines. In essence: create a new set of sg_list which is symmetric. > Don't think that it uses two channels from what a saw in their implementation. I believe it was breaking it up like atc_prep_dma_sg did. > Of course that was on kernel 4.19.x where DMA_SG was still available. > >>>> - what are the chances to get my driver for memcpy like transfers from >>>> user space using DMA_SG upstream? ("dma-sg-proxy") >>> >>> pretty bleak IMHO. >> >> fwiw, I also get requests time-to-time to DMA memcpy support from user >> space from companies trying to move from bare-metal code to Linux. >> >> What could be plausible is a generic dmabuf-to-dmabuf copy driver (V4L2 >> can provide dma-buf, DRM can also). >> If there is a DMA memcpy channel available, use that, otherwise use some >> method to do the copy, user space should not care how it is done. > > Yes, i'm using it together with a v4l2 capture driver and also saw the dma-buf thing but did not find a way how to bring this together with "ordinary user memory". One of the aim of dma-buf is to share buffers between drivers and user space (among drivers and/or drivers and userspace), but I might be missing something. > For me the root of my problem seems to be that dma_alloc_coherent leads to uncached memory on ARM platforms. It depends, but in most cases that is true. > But maybe i am doing it all wrong ;-) > >> Where things are going to get a bit more trickier is when the copy needs >> to be triggered by other DMA channel (completion of a frame reception >> triggering an interleaved sub-frame extraction copy). >> You don't want to extract from a buffer which can be modified while the >> other channel is writing to it. > > I think that would be no problem in case of our v4l2 capture driver doing both DMAs: > Framebuffer DMA for streaming and Zynqmp DMA (using DMA_SG) to get it to "ordinary user memory". > But as i wrote before i prefer to do the "logic and management" in userspace so the capture driver is just using the first DMA and the "dma-sg-proxy" driver is only used as a memcpy replacement. > As said this is all working fine with kernel 4.19.x but now we are stuck :-( > >> In Linux the DMA is used for kernel and user space can only use it >> implicitly via standard subsystems. >> Misused DMA can be very dangerous and giving full access to program a >> transfer can open a can of worms. > > Fully understand that! > But i also hope you understand that we are developing a "closed system" and do not have a problem with that at all. > We are also willing to bring that driver upstream for anyone doing the same but of course this should not affect security of any desktop or server systems. > Maybe we just need the right place for that driver?! What might be plausible is to introduce hw offloading support for memcpy type of operations in a similar fashion how for example crypto does it? The issue with a user space implemented logic is that it is not portable between systems with different DMAs. It might be that on one DMA the setup takes longer than do a CPU copy of X bytes, on the other DMA it might be significantly less or higher. Using CPU vs DMA for a copy in certain lengths and setups should not be a concern of the user space. Yes, you have a closed system with controlled parameters, but a generic mem2mem_offload framework should be usable on other setups and the same binary should be working on different DMAs where one is not efficient for <512 bytes, the other shows benefits under 128bytes. > Not sure if staging would change your concerns. > > Thanks and best regards, > Thomas > - Péter Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki