> On 30 June 2020 at 14:31 Peter Ujfalusi <peter.ujfalusi@xxxxxx> wrote: > > > > > On 29/06/2020 18.18, Thomas Ruf wrote: > > > >> On 26 June 2020 at 12:29 Peter Ujfalusi <peter.ujfalusi@xxxxxx> wrote: > >> > >> On 24/06/2020 16.58, Thomas Ruf wrote: > >>> > >>>> On 24 June 2020 at 14:07 Peter Ujfalusi <peter.ujfalusi@xxxxxx> wrote: > >>>> On 24/06/2020 12.38, Vinod Koul wrote: > >>>>> On 24-06-20, 11:30, Thomas Ruf wrote: > >>>>> > >>>>>> To make it short - i have two questions: > >>>>>> - what are the chances to revive DMA_SG? > >>>>> > >>>>> 100%, if we have a in-kernel user > >>>> > >>>> Most DMAs can not handle differently provisioned sg_list for src and dst. > >>>> Even if they could handle non symmetric SG setup it requires entirely > >>>> different setup (two independent channels sending the data to each > >>>> other, one reads, the other writes?). > >>> > >>> Ok, i implemented that using zynqmp_dma on a Xilinx Zynq platform (obviously ;-) and it works nicely for us. > >> > >> I see, if the HW does not support it then something along the lines of > >> what the atc_prep_dma_sg did can be implemented for most engines. > >> > >> In essence: create a new set of sg_list which is symmetric. > > > > Sorry, not sure if i understand you right? > > You suggest that in case DMA_SG gets revived we should restrict the support to symmetric sg_lists? > > No, not at all. That would not make much sense. Glad that this was just a misunderstanding. > > Just had a glance at the deleted code and the *_prep_dma_sg of these drivers had code to support asymmetric lists and by that "unaligend" memory (relative to page start): > > at_hdmac.c > > dmaengine.c > > dmatest.c > > fsldma.c > > mv_xor.c > > nbpfaxi.c > > ste_dma40.c > > xgene-dma.c > > xilinx/zynqmp_dma.c > > > > Why not just revive that and keep this nice functionality? ;-) > > What I'm saying is that the drivers (at least at_hdmac) in essence > creates aligned sg_list out from the received non aligned ones. > It does this w/o actually creating the sg_list itself, but that's just a > small detail. > > In a longer run what might make sense is to have a helper function to > convert two non symmetric sg_list into two symmetric ones so drivers > will not have to re-implement the same code and they will only need to > care about symmetric sg lists. Sounds like a superb idea! > Note, some DMAs can actually handle non symmetric src and dst lists, but > I believe it is rare. So i was a bit lucky that the zynqmp_dma is one of them. > >> What might be plausible is to introduce hw offloading support for memcpy > >> type of operations in a similar fashion how for example crypto does it? > > > > Sounds good to me, my proxy driver implementation could be a good start for that, too! > > It needs to find it's place as well... I'm not sure where that would be. > Simple block-copy offload, sg copy offload, interleaved offload (frame > extraction) offload, dmabuf copy offload comes to mind as candidates. And who would decide that... > >> The issue with a user space implemented logic is that it is not portable > >> between systems with different DMAs. It might be that on one DMA the > >> setup takes longer than do a CPU copy of X bytes, on the other DMA it > >> might be significantly less or higher. > > > > Fully agree with that! > > I was also unsure how my approach will perform but in our case the latency was increased by ~20%, cpu load roughly stayed the same, of course this was the benchmark from user memory to user memory. > > From uncached to user memory the DMA was around 15 times faster. > > It depends on the size of the transfer. Lots of small individual > transfers might be worst via DMA do the the setup time, completion > handling, etc. Yes, exactly. Thanks again for your great input! best regards, Thomas PS: I am on vacation for the next two weaks and probably will not check this mailing list till 20.7. But will fetch later.