Hi Christoph I have been testing with real hardware on arm64 your patchset. And uvc performs 20 times better using Kieran's test https://github.com/ribalda/linux/tree/uvc-noncontiguous These are the result of running yavta --capture=1000 dma_alloc_noncontiguous frames: 999 packets: 999 empty: 0 (0 %) errors: 0 invalid: 0 pts: 0 early, 0 initial, 999 ok scr: 0 count ok, 0 diff ok sof: 2048 <= sof <= 0, freq 0.000 kHz bytes 78466000 : duration 33303 FPS: 29.99 URB: 418105/5000 uS/qty: 83.621 avg 98.783 std 17.396 min 1264.688 max (uS) header: 100040/5000 uS/qty: 20.008 avg 19.458 std 2.969 min 454.167 max (uS) latency: 347653/5000 uS/qty: 69.530 avg 98.937 std 9.114 min 1256.875 max (uS) decode: 70452/5000 uS/qty: 14.090 avg 11.547 std 6.146 min 271.510 max (uS) raw decode speed: 8.967 Gbits/s raw URB handling speed: 1.501 Gbits/s throughput: 18.848 Mbits/s URB decode CPU usage 0.211500 % usb_alloc_coherent frames: 999 packets: 999 empty: 0 (0 %) errors: 0 invalid: 0 pts: 0 early, 0 initial, 999 ok scr: 0 count ok, 0 diff ok sof: 2048 <= sof <= 0, freq 0.000 kHz bytes 70501712 : duration 33319 FPS: 29.98 URB: 1854128/5000 uS/qty: 370.825 avg 417.133 std 14.539 min 2875.760 max (uS) header: 98765/5000 uS/qty: 19.753 avg 30.714 std 1.042 min 573.463 max (uS) latency: 453316/5000 uS/qty: 90.663 avg 114.987 std 4.065 min 860.795 max (uS) decode: 1400811/5000 uS/qty: 280.162 avg 330.786 std 6.305 min 2758.202 max (uS) raw decode speed: 402.866 Mbits/s raw URB handling speed: 304.214 Mbits/s throughput: 16.927 Mbits/s URB decode CPU usage 4.204200 % Best regards On Tue, Nov 10, 2020 at 10:57 AM Christoph Hellwig <hch@xxxxxx> wrote: > > On Tue, Nov 10, 2020 at 06:50:32PM +0900, Tomasz Figa wrote: > > In what terms it doesn't actually work? Last time I checked some > > platforms actually defined CONFIG_DMA_NONCOHERENT, so those would > > instead use the kmalloc() + dma_map() path. I don't have any > > background on why that was added and whether it needs to be preserved, > > though. Kieran, Laurent, do you have any insight? > > CONFIG_DMA_NONCOHERENT is set on sh and mips for platforms that may > support non-coherent DMA at compile time (but at least for mips that > doesn't actually means this gets used). Using that ifdef to decide > on using usb_alloc_coherent vs letting the usb layer map the data > seems at best odd, and if we are unlucky papering over a bug somewhere. -- Ricardo Ribalda