Re: Linux SPI Slave fails to respond under high load on BeagleBone Black

Glenn Schmottlach <gschmottlach@xxxxxxxxx> · Fri, 13 Nov 2020 15:34:48 -0500

On Fri, Nov 13, 2020 at 10:08 AM Vignesh Raghavendra <vigneshr@xxxxxx> wrote:

>
> Does the transfer resume if you manually updated WCNT field to a
> value > 32 using devmem2 when slave appears to be "stuck"?
>

Unfortunately, no, the slave does not become unstuck. I can see that
the WCNT field typically drops back to zero but nothing appears to
happen.

>
>
> Could you see if below diff helps? This delays enabling of channel
> until TX DMA is queued so that WCNT does not decrement
>
>
> diff --git a/drivers/spi/spi-omap2-mcspi.c b/drivers/spi/spi-omap2-mcspi.c
> index d4c9510af393..bf8c6526bcd7 100644
> --- a/drivers/spi/spi-omap2-mcspi.c
> +++ b/drivers/spi/spi-omap2-mcspi.c
> @@ -426,6 +426,8 @@ static void omap2_mcspi_tx_dma(struct spi_device *spi,
>         }
>         dma_async_issue_pending(mcspi_dma->dma_tx);
>         omap2_mcspi_set_dma_req(spi, 0, 1);
> +       if (spi_controller_is_slave(master))
> +               omap2_mcspi_set_enable(spi, 1);
>  }
>
>  static unsigned
> @@ -1194,7 +1196,9 @@ static int omap2_mcspi_transfer_one(struct spi_master *master,
>                     master->can_dma(master, spi, t))
>                         omap2_mcspi_set_fifo(spi, t, 1);
>
> -               omap2_mcspi_set_enable(spi, 1);
> +               /* For slave TX, enable after DMA is queued */
> +               if (!spi_controller_is_slave(master) || !t->tx_buf)
> +                       omap2_mcspi_set_enable(spi, 1);
>
>                 /* RX_ONLY mode needs dummy data in TX reg */
>                 if (t->tx_buf == NULL)

I made this change and initially thought it improved things but as the
clock speed is increased (> 10000 Hz) it reverts to the prior
behavior. I can almost see the SPI slave stutter/stammer along until
it finally stops responding to the SPI master. At this point WCNT == 0
and poking in a value > 32 has no effect. From my testing I can
definitely see a dramatic decline in performance (or susceptibility
for the SPI slave to get "stuck") as the clock rate increases. For my
use case, the SPI slave generates telemetry and thus discards all
incoming data from the MOSI pin. Likewise my SPI master is only
interested in the MISO data. I hacked up the spi-pipe application to
provide no TX buffer for the SPI master (thus clocking out zeros to
the SPI slave). Likewise for the SPI slave the spi-pipe program does
not provide an RX buffer since the clock is merely being used to clock
"out" telemetry data to the SPI master. Making this change offers some
improvement but it usually doesn't last long.

It certainly seems there is one (or more) race conditions. Very
rarely, a test will run indefinitely but in general it's not
repeatable. It seems that the SPI slave must be able to atomically
submit a TX buffer in order for this to work. Given your patch, it
appears it must be difficult to thread in successive TX buffers since
the DMA must be scheduled and WCNT set appropriately in order to clock
out data from the slave. Of course this all has to happen while the
SPI master randomly clocks data to/from the slave. Perhaps if I had a
better understanding of the normal program flow I could see this more
clearly. Do you have any additional suggestions I could investigate?
Certainly, this problem is easy to recreate with two development
boards (or BeagleBone Blacks in my case). Have you ever encountered it
in your testing?

Thanks,

Glenn