On Sun, Mar 24, 2019 at 12:23:39PM +0100, kernel@xxxxxxxxxxxxxxxx wrote: > On 24.03.2019, at 11:15, Lukas Wunner <lukas@xxxxxxxxx> wrote: > > The problem is that making the DMA minimum length configurable > > by itself impacts performance because a memory read is necessary > > to retrieve the limit, instead of a hardcoded immediate in the > > machine code. Ultimately this feature is only of interest to > > developers optimizing the code, not really to end users. > > The host path is IMO not so hot that one additional memory read is > that expensive. As it stands, spi-bcm2835.c is fairly CPU-intensive. We should try to reduce the load, not add more. > > > This may have negative impact when transferring lots of bytes to > > > some mcus without SPI buffers at the fastest possible clock speed, > > > where it helps when there is a gap after each byte. > > > > Hm, wouldn't a slower SPI clock speed achieve the same? > > Yes, it would, but then you would need to make the SPI clock > cycle possibly 3-4 times as long to communicate with an atmega MCU > in slave mode, which is essentially wasting possible transfer rates. I don't quite follow, what's the downside of a slower transfer rate with 8 cycles/byte if the effective speed is the same as a higher transfer rate with 9 cycles/byte? > > Poll mode could function the same and precalculate the time it > > takes for the TX FIFO to empty or the RX FIFO to become filled, > > and usleep_range() as long to yield the CPU to other tasks. > > How would you speed up poll mode really? It is polling and > consuming CPU cycles anyway! No, usleep_range() either yields the CPU to another task or allows the CPU to be put in a low power state. > In my experience minimizing interrupts should be the main goal > because this adds long ???stalls??? at high SPI clock speeds. > I do not have the exact numbers now, but there is a latency that > typically produces gaps in the order of 2-10us (depending on > clock frequencies) Okay. Then the goal should probably be to reduce the cost of DMA and use DMA for anything that requires more than 1 or 2 interrupts. > If you want to give optimizing things a try (e.g: pseudo DMA > mode), then please go head and post patches and I will try > to give it a try when triggering those. Could you tell me a little more about your use cases so that I can keep them in mind when working on optimizations? What are the SPI slaves you're using, how many bytes are you typically transferring, do you use TX-only / RX-only transfers, have you enabled CONFIG_PREEMPT_RT_FULL? In our case it's mainly the ks8851 Ethernet chip and the hi3110 CAN chip that we're concerned with. The ks8851 usually transfers either very small buffers (around 5 bytes) to access registers, or relatively large buffers (around 1000 bytes) with packet data. The latter is always either RX-only or TX-only. The hi3110 also uses typically small buffers, smaller than the 64 byte FIFO length. The problem is that the number of transfers in particular with the ks8851 is staggering. We're talking gigabytes transferred per day here. So we really feel the pain if cost per transfer goes up. Thanks, Lukas