Hi Jai, (CC'ing Vinod, the maintainer of the DMA engine subsystem, for a question below) On Fri, Aug 18, 2023 at 03:55:06PM +0530, Jai Luthra wrote: > On Aug 15, 2023 at 16:00:51 +0300, Tomi Valkeinen wrote: > > On 11/08/2023 13:47, Jai Luthra wrote: > > > From: Pratyush Yadav <p.yadav@xxxxxx> [snip] > > > +static int ti_csi2rx_start_streaming(struct vb2_queue *vq, unsigned int count) > > > +{ > > > + struct ti_csi2rx_dev *csi = vb2_get_drv_priv(vq); > > > + struct ti_csi2rx_dma *dma = &csi->dma; > > > + struct ti_csi2rx_buffer *buf; > > > + unsigned long flags; > > > + int ret = 0; > > > + > > > + spin_lock_irqsave(&dma->lock, flags); > > > + if (list_empty(&dma->queue)) > > > + ret = -EIO; > > > + spin_unlock_irqrestore(&dma->lock, flags); > > > + if (ret) > > > + return ret; > > > + > > > + dma->drain.len = csi->v_fmt.fmt.pix.sizeimage; > > > + dma->drain.vaddr = dma_alloc_coherent(csi->dev, dma->drain.len, > > > + &dma->drain.paddr, GFP_KERNEL); > > > + if (!dma->drain.vaddr) > > > + return -ENOMEM; > > > > This is still allocating a large buffer every time streaming is started (and > > with streams support, a separate buffer for each stream?). > > > > Did you check if the TI DMA can do writes to a constant address? That would > > be the best option, as then the whole buffer allocation problem goes away. > > I checked with Vignesh, the hardware can support a scenario where we > flush out all the data without allocating a buffer, but I couldn't find > a way to signal that via the current dmaengine framework APIs. Will look > into it further as it will be important for multi-stream support. That would be the best option. It's not immediately apparent to me if the DMA engine API supports such a use case. dmaengine_prep_interleaved_dma() gives you finer grain control on the source and destination increments, but I haven't seen a way to instruct the DMA engine to direct writes to /dev/null (so to speak). Vinod, is this something that is supported, or could be supported ? > > Alternatively, can you flush the buffers with multiple one line transfers? > > The flushing shouldn't be performance critical, so even if that's slower > > than a normal full-frame DMA, it shouldn't matter much. And if that can be > > done, a single probe time line-buffer allocation should do the trick. > > There will be considerable overhead if we queue many DMA transactions > (in the order of 1000s or even 100s), which might not be okay for the > scenarios where we have to drain mid-stream. Will have to run some > experiments to see if that is worth it. > > But one optimization we can for sure do is re-use a single drain buffer > for all the streams. We will need to ensure to re-allocate the buffer > for the "largest" framesize supported across the different streams at > stream-on time. If you implement .device_prep_interleaved_dma() in the DMA engine driver you could write to a single line buffer, assuming that the hardware would support so in a generic way. > My guess is the endpoint is not buffering a full-frame's worth of data, > I will also check if we can upper bound that size to something feasible. > > > Other than this drain buffer topic, I think this looks fine. So, I'm going > > to give Rb, but I do encourage you to look more into optimizing this drain > > buffer. > > Thank you! > > > Reviewed-by: Tomi Valkeinen <tomi.valkeinen@xxxxxxxxxxxxxxxx> -- Regards, Laurent Pinchart