Re: [PATCH 3/3] spi: bcm2835: add module parameter to configure minimum length for dma

Lukas Wunner <lukas@xxxxxxxxx> · Mon, 25 Mar 2019 06:30:12 +0100

On Sun, Mar 24, 2019 at 12:23:39PM +0100, kernel@xxxxxxxxxxxxxxxx wrote:
> On 24.03.2019, at 11:15, Lukas Wunner <lukas@xxxxxxxxx> wrote:
> > The problem is that making the DMA minimum length configurable
> > by itself impacts performance because a memory read is necessary
> > to retrieve the limit, instead of a hardcoded immediate in the
> > machine code.  Ultimately this feature is only of interest to
> > developers optimizing the code, not really to end users.
> 
> The host path is IMO not so hot that one additional memory read is
> that expensive.

As it stands, spi-bcm2835.c is fairly CPU-intensive.  We should try to
reduce the load, not add more.

> > > This may have negative impact when transferring lots of bytes to
> > > some mcus without SPI buffers at the fastest possible clock speed,
> > > where it helps when there is a gap after each byte.
> > 
> > Hm, wouldn't a slower SPI clock speed achieve the same?
> 
> Yes, it would, but then you would need to make the SPI clock 
> cycle possibly 3-4 times as long to communicate with an atmega MCU
> in slave mode, which is essentially wasting possible transfer rates.

I don't quite follow, what's the downside of a slower transfer rate
with 8 cycles/byte if the effective speed is the same as a higher
transfer rate with 9 cycles/byte?

> > Poll mode could function the same and precalculate the time it
> > takes for the TX FIFO to empty or the RX FIFO to become filled,
> > and usleep_range() as long to yield the CPU to other tasks.
> 
> How would you speed up poll mode really? It is polling and
> consuming CPU cycles anyway!

No, usleep_range() either yields the CPU to another task or allows
the CPU to be put in a low power state.

> In my experience minimizing interrupts should be the main goal
> because this adds long ???stalls??? at high SPI clock speeds.
> I do not have the exact numbers now, but there is a latency that
> typically produces gaps in the order of 2-10us (depending on
> clock frequencies)

Okay.  Then the goal should probably be to reduce the cost of DMA
and use DMA for anything that requires more than 1 or 2 interrupts.

> If you want to give optimizing things a try (e.g: pseudo DMA 
> mode), then please go head and post patches and I will try
> to give it a try when triggering those.

Could you tell me a little more about your use cases so that I can
keep them in mind when working on optimizations?  What are the SPI
slaves you're using, how many bytes are you typically transferring,
do you use TX-only / RX-only transfers, have you enabled
CONFIG_PREEMPT_RT_FULL?

In our case it's mainly the ks8851 Ethernet chip and the hi3110 CAN
chip that we're concerned with.  The ks8851 usually transfers either
very small buffers (around 5 bytes) to access registers, or relatively
large buffers (around 1000 bytes) with packet data.  The latter is
always either RX-only or TX-only.  The hi3110 also uses typically small
buffers, smaller than the 64 byte FIFO length.  The problem is that
the number of transfers in particular with the ks8851 is staggering.
We're talking gigabytes transferred per day here.  So we really feel
the pain if cost per transfer goes up.

Thanks,

Lukas