On Sun, Mar 24, 2019 at 09:58:27AM +0100, kernel@xxxxxxxxxxxxxxxx wrote: > On 22.03.2019, at 13:36, Lukas Wunner <lukas@xxxxxxxxx> wrote: > > On Sun, Feb 24, 2019 at 04:23:11PM +0000, kernel@xxxxxxxxxxxxxxxx wrote: > > > +/* define dma min number of bytes to use in dma mode with value validation */ > > > +static int dma_min_bytes_limit_set(const char *val, > > > + const struct kernel_param *kp) > > > +{ > > > + unsigned int v; > > > + > > > + if (kstrtouint(val, 10, &v)) > > > + return -EINVAL; > > > + /* value needs to be a multiple of 4 */ > > > + if (v % 4) { > > > + pr_err("dma_min_bytes_limit needs to be a multiple of 4\n"); > > > + return -EINVAL; > > > + } > > > > Transfers don't need to be a multiple of 4 to be eligible for DMA, > > so this check can be dropped. > > I definitely did not want to write a custom module argument parser > but if i remember correctly there is one limitation on the transmission path > where you would hit some inefficiencies in the DMA code when you run > transfers that are not a multiple of 4 - especially for short transfers. No, the *length* of a transfer in DMA mode doesn't need to be a multiple of 4. You just write the length to the DLEN register and the chip counts that down to zero while clocking out bytes. Once it reaches zero, it stops clocking out bytes. Because the FIFO is accessed with 32-bit width in DMA mode, you'll leave a few extra bytes behind in the TX FIFO if DLEN is not a multiple of 4, so you have to clear the TX FIFO before the next transfer is commenced. But the driver does all that. The inefficiency you're referring to only occurs if you have a transfer that spans multiple non-contiguous pages and in the first page, it starts at an offset that's not a multiple of 4. In that case we transfer the first few bytes of the first page via programmed I/O such that the offset in the first page becomes a multiple of 4 and then we can switch to transferring by DMA. We handle that just fine since 3bd7f6589f67. Note, most clients transfer bytes from a kmalloc'ed allocation and those are always contiguous in memory, so the above is completely irrelevant for them. It's only relevant for vmalloc'ed allocations, which are probably rare in SPI client drivers. Thanks, Lukas