Re: dmaengine and using tx descriptors multiple times

Vinod Koul <vinod.koul@xxxxxxxxx> · Mon, 16 Mar 2015 10:58:35 +0530



On Fri, Mar 13, 2015 at 10:23:50AM +0100, Robert Jarzmik wrote:
> Vinod Koul <vinod.koul@xxxxxxxxx> writes:
> 
> > On Mon, Mar 09, 2015 at 11:18:53PM +0100, Robert Jarzmik wrote:
> >> And here comes my question : how do I free the hardware descriptors for tx now ?
> >> As I chose to not free them upon tx completion, is there an API call to do it ?
> > The problem with this is that we are assuming the
> > DMA_PREP_CONTINUE/DMA_PREP_ACK etc is valid for slave. These were defined
> > for async_tx. I have added Dan on how they handled it there
> For Dan, in [2] a reminder for the usecase, in order to not be lost in the
> conversation.
> 
> > But, I think we should not use these for slave transactions.
> >
> > In your case what is the benefit you get from reusing the descriptors?
> In my case, ie. the pxa architecture conversion to dmaengine, I'm converting the
> camera capture interface.
> 
> These scatter-gather buffers are huge (several MB, depending on sensor
> definition). Moreover the video4linux2 API relies on the fact that you can queue
> up a buffer multiple times.
> 
> As these buffers are huge, the preparation is not negligible (a thousand
> hardware linked descriptors for 1 frame), and there are usually 6 frames in the
> ring. The pxa architecture (at least pxa27x) is quite old, and the cost to
> recalculate the descriptors chain is not negligible (2ms to 10ms). Moreover this
> time is needed to handle unqueued buffers : store them or compress them, etc ...
the dmaengine driver is not expected to do buffer mapping for slave cases,
so I am still not clear where the savings are coming from. You can keep the
buffer ready or prepare ahead of time and submit when required.

> As this is video capture, the real time aspect is important. Missing one frame
> here and there makes the video human unfriendly (as opposed to disk/io subsystem
> where this is not that important).
And preparing buffer ahead of time can help
> 
> Ah and to be more precise about what I'm thinking of [1], this is what I have in
> mind. I don't know if it's the way to go, but it will make the query more
> concrete. I also have a driver in preparation for pxa architecture with this,
> and I want to check if the requirements are met for me to upstream it.
> 
> Cheers.
> 
> Robert
> 
> [1] Patch to support conversation
Before this pls read up on this Documentation/crypto/async-tx-api.txt
Section 3.3

-- 
~Vinod

> ---<8---
> diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> index b6997a0..b2a975c 100644
> --- a/include/linux/dmaengine.h
> +++ b/include/linux/dmaengine.h
> @@ -432,6 +432,8 @@ struct dmaengine_unmap_data {
>   * @chan: target channel for this operation
>   * @tx_submit: accept the descriptor, assign ordered cookie and mark the
>   * descriptor pending. To be pushed on .issue_pending() call
> + * @tx_release: optional method to release tx resources if the complete()
> + *	was programed to not release the descriptor
>   * @callback: routine to call after this operation is complete
>   * @callback_param: general parameter to pass to the callback routine
>   * ---async_tx api specific fields---
> @@ -445,6 +447,7 @@ struct dma_async_tx_descriptor {
>  	dma_addr_t phys;
>  	struct dma_chan *chan;
>  	dma_cookie_t (*tx_submit)(struct dma_async_tx_descriptor *tx);
> +	void (*tx_release)(struct dma_async_tx_descriptor *tx);
>  	dma_async_tx_callback callback;
>  	void *callback_param;
>  	struct dmaengine_unmap_data *unmap;
> @@ -796,6 +799,12 @@ static inline dma_cookie_t dmaengine_submit(struct dma_async_tx_descriptor *desc
>  	return desc->tx_submit(desc);
>  }
>  
> +static inline void dmaengine_tx_release(struct dma_async_tx_descriptor *desc)
> +{
> +	if (desc->tx_release)
> +		desc->tx_release(desc);
> +}
> +
>  static inline bool dmaengine_check_align(u8 align, size_t off1, size_t off2, size_t len)
>  {
>  	size_t mask;
> 
> [2] Usecase for Dan to understand the conversation
> > My usecase is as follows, from a driver point of view :
> >  - tx = dmaengine_prep_slave_sg(dma_chan, sg, sglen, DMA_PREP_CONTINUE);
> >  - dma_map_sg(dev, sg, sglen, DMA_DEV_TO_MEM)
> >  - dmaengine_submit(tx)
> >  - dma_async_issue_pending(dma_chan);
> >  - ... wait for tx->callback()
> >    => here, because of DMA_PREP_CONTINUE, the hardware descriptors are not
> >    freed, as what I need to reuse
> > 
> >  - dma_sync_sg_for_cpu(dev, sg, sglen, )
> >  - copy video buffer somewhere (or not if frame interleaving)
> > 
> >  - dmaengine_submit(tx)
> >  - dma_async_issue_pending(dma_chan)
> >    => the video buffer is captured again, so far so good
> >  - ... wait for tx->callback()
> > 

-- 
--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html