Re: [PATCH v2 4/4] media: stk1160: use dma_alloc_noncontiguous API

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dafna,

On Tue, Jan 25, 2022 at 10:02:13AM +0200, Dafna Hirschfeld wrote:
> Replace the urb buffers allocation to use the
> noncontiguous API. This improves performance on ARM
> platform since the previous dma allocation was
> coherent dma allocation which disables the cache on
> ARM platforms. With the noncontiguous API the cache
> is not disabled.
> The noncontiguous API requires the driver to
> handle synchronization.
> This commit is similar to this one for the uvc driver:
> 
^^^^^^
This description still looks weird. How about:

"""
Replace the urb buffers allocation to use the noncontiguous API

This improves performance on ARM platform where DMA coherent allocations
produce uncached mappings. Note that the noncontiguous API
requires the driver to handle synchronization.

This commit is similar to this one for the uvc driver:

  https://lkml.org/lkml/2021/3/12/1506

Performance ...
""

Also, see below a little change:

> 
> Performance tests on rock-pi4 (Arm64) shows about 15x
> improvements:
> 
> == DMA NONCONTIGUOUS ==
> total durations: 20.63678480 sec
> urb processing durations: 0.286864889 sec
> uS/qty: 286864/2508 avg: 114.379 min: 0.583 max: 155.461 (uS)
> FPS: 24.92
> lost: 0 done: 500
> raw decode speed: 11.603 Gbits/s
> bytes 414831228.000
> bytes/urb: 165403
> 
> == DMA COHERENT ==
> total durations: 20.73551767 sec
> urb processing durations: 4.541559160 sec
> uS/qty: 4541559/2509 avg: 1810.107 min: 0.583 max: 2113.163 (uS)
> FPS: 24.90
> lost: 0 done: 500
> raw decode speed: 730.738 Mbits/s
> bytes 414785444.000
> bytes/urb: 165319
> 
> Performance tests on x86 laptop show no significant
> difference:
> 
> == DMA NONCONTIGUOUS ==
> total durations: 20.220590102 sec
> urb processing durations: 0.63021818 sec
> uS/qty: 63021/2512 avg: 25.088 min: 0.138 max: 146.750 (uS)
> FPS: 24.72
> lost: 0 done: 500
> raw decode speed: 52.751 Gbits/s
> bytes 415421032.000
> bytes/urb: 165374
> 
> == DMA COHERENT ==
> total durations: 20.220475614 sec
> urb processing durations: 0.64751972 sec
> uS/qty: 64751/2512 avg: 25.777 min: 0.168 max: 132.250 (uS)
> FPS: 24.72
> lost: 0 done: 500
> raw decode speed: 51.927 Gbits/s
> bytes 415422794.000
> bytes/urb: 165375
> 
> Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@xxxxxxxxxxxxx>
> ---
>  drivers/media/usb/stk1160/stk1160-v4l.c   |   4 +
>  drivers/media/usb/stk1160/stk1160-video.c | 114 +++++++++++++---------
>  drivers/media/usb/stk1160/stk1160.h       |  10 ++
>  3 files changed, 84 insertions(+), 44 deletions(-)
> 
> diff --git a/drivers/media/usb/stk1160/stk1160-v4l.c b/drivers/media/usb/stk1160/stk1160-v4l.c
> index ebf245d44005..a1f785a5ffd8 100644
> --- a/drivers/media/usb/stk1160/stk1160-v4l.c
> +++ b/drivers/media/usb/stk1160/stk1160-v4l.c
> @@ -232,6 +232,10 @@ static int stk1160_start_streaming(struct stk1160 *dev)
>  
>  	/* submit urbs and enables IRQ */
>  	for (i = 0; i < dev->isoc_ctl.num_bufs; i++) {
> +		struct stk1160_urb *stk_urb = &dev->isoc_ctl.urb_ctl[i];
> +
> +		dma_sync_sgtable_for_device(stk1160_get_dmadev(dev), stk_urb->sgt,
> +					    DMA_FROM_DEVICE);
>  		rc = usb_submit_urb(dev->isoc_ctl.urb_ctl[i].urb, GFP_KERNEL);
>  		if (rc) {
>  			stk1160_err("cannot submit urb[%d] (%d)\n", i, rc);
> diff --git a/drivers/media/usb/stk1160/stk1160-video.c b/drivers/media/usb/stk1160/stk1160-video.c
> index f3c0497a8539..6600afc565b8 100644
> --- a/drivers/media/usb/stk1160/stk1160-video.c
> +++ b/drivers/media/usb/stk1160/stk1160-video.c
> @@ -295,7 +295,9 @@ static void stk1160_process_isoc(struct stk1160 *dev, struct urb *urb)
>  static void stk1160_isoc_irq(struct urb *urb)
>  {
>  	int i, rc;
> -	struct stk1160 *dev = urb->context;
> +	struct stk1160_urb *stk_urb = urb->context;
> +	struct stk1160 *dev = stk_urb->dev;
> +	struct device *dma_dev = stk1160_get_dmadev(dev);
>  
>  	switch (urb->status) {
>  	case 0:
> @@ -310,6 +312,10 @@ static void stk1160_isoc_irq(struct urb *urb)
>  		return;
>  	}
>  
> +	invalidate_kernel_vmap_range(stk_urb->transfer_buffer,
> +				     urb->transfer_buffer_length);
> +	dma_sync_sgtable_for_cpu(dma_dev, stk_urb->sgt, DMA_FROM_DEVICE);
> +
>  	stk1160_process_isoc(dev, urb);
>  
>  	/* Reset urb buffers */
> @@ -318,6 +324,7 @@ static void stk1160_isoc_irq(struct urb *urb)
>  		urb->iso_frame_desc[i].actual_length = 0;
>  	}
>  
> +	dma_sync_sgtable_for_device(dma_dev, stk_urb->sgt, DMA_FROM_DEVICE);
>  	rc = usb_submit_urb(urb, GFP_ATOMIC);
>  	if (rc)
>  		stk1160_err("urb re-submit failed (%d)\n", rc);
> @@ -353,37 +360,34 @@ void stk1160_cancel_isoc(struct stk1160 *dev)
>  	stk1160_dbg("all urbs killed\n");
>  }
>  
> +static void stk_free_urb_buffer(struct stk1160 *dev, struct stk1160_urb *stk_urb)

Change this function to "stk1160_free_urb". With that,

Reviewed-by: Ezequiel Garcia <ezequiel@xxxxxxxxxxxxxxxxxxxx>

Thanks,
Ezequiel



[Index of Archives]     [Linux Input]     [Video for Linux]     [Gstreamer Embedded]     [Mplayer Users]     [Linux USB Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Yosemite Backpacking]

  Powered by Linux