On Tue, Nov 10, 2020 at 2:06 AM Vignesh Raghavendra <vigneshr@xxxxxx> wrote: > Could you trace how much of data is left in TX FIFO using MCSPI_CHSTAT_0 > and MCSPI_XFERLEVEL registers? > When the SPI slave is no longer responding to the SPI master the following was observed on the SPI slave: devmem2 0x48030130 w (CHSTAT_0) Read at address 0x48030130 (0xb6f5c130): 0x0000004F So 0x4F means: RXFFF=1 (FIFO RX buffer is full) TXFFE=1 (FIFO transmit buffer is empty) EOT=1 (Set to 1 at end of SPI transfer) TXS=1 (TX register status: register is empty) RXS=1 (RX register status: register is full) So it appears there is data ready to be read from the SPI master and there is no data scheduled to be sent. devmem2 0x4803017C w (XFERLEVEL) Read at address 0x4803017C (0xb6fd717c): 0x00000000 So WCNT = 0 and it appears no words are scheduled to be transferred on the channel. Again, the SPI slave appears to be "stuck" Here is a function trace on the SPI slave leading up to it being "stuck" root@beaglebone:~# spi-pipe -d /dev/spidev0.0 -s 24000000 -b 32 -n -1 < /dev/random | hexdump -vC [ 9895.479249] omap2_mcspi_setup(1136): enter [ 9895.483597] omap_mcspi_runtime_resume(1480): enter [ 9895.488569] omap_mcspi_runtime_resume(1503): exit [ 9895.499704] omap2_mcspi_setup_transfer(987): enter [ 9895.505606] omap2_mcspi_setup_transfer(1068): exit [ 9895.510429] omap2_mcspi_setup(1175): exit [ 9895.522659] omap2_mcspi_set_cs(273): enter -> enable = 1 [ 9895.528052] omap2_mcspi_set_cs(304): exit [ 9895.532392] omap2_mcspi_prepare_message(1378): enter [ 9895.542695] omap2_mcspi_prepare_message(1396): exit [ 9895.547606] omap2_mcspi_can_dma(1404): enter [ 9895.551889] omap2_mcspi_can_dma(1415): exit [ 9895.560061] omap2_mcspi_set_cs(273): enter -> enable = 0 [ 9895.566406] omap2_mcspi_set_cs(304): exit [ 9895.570578] omap2_mcspi_transfer_one(1254): enter [ 9895.581994] omap2_mcspi_set_enable(255): enter -> enable = 0 [ 9895.588708] omap2_mcspi_set_enable(265): exit [ 9895.596666] omap2_mcspi_setup_transfer(987): enter [ 9895.601536] omap2_mcspi_setup_transfer(1068): exit [ 9895.612715] omap2_mcspi_can_dma(1404): enter [ 9895.617007] omap2_mcspi_can_dma(1415): exit [ 9895.621202] omap2_mcspi_set_fifo(343): enter -> enable = 1 [ 9895.635896] mcspi_bytes_per_word(214): enter [ 9895.640190] mcspi_bytes_per_word(223): exit [ 9895.647441] omap2_mcspi_set_fifo(376): exit [ 9895.651646] omap2_mcspi_set_enable(255): enter -> enable = 1 [ 9895.662705] omap2_mcspi_set_enable(265): exit [ 9895.667085] omap2_mcspi_can_dma(1404): enter [ 9895.671368] omap2_mcspi_can_dma(1415): exit [ 9895.681567] omap2_mcspi_txrx_dma(675): enter [ 9895.686832] omap2_mcspi_irq_handler(1203): enter [ 9895.691495] omap2_mcspi_irq_handler(1215): exit [ 9895.701131] omap2_mcspi_tx_dma(472): enter [ 9895.706728] omap2_mcspi_set_dma_req(232): enter [ 9895.711344] omap2_mcspi_tx_callback(454): enter [ 9895.715943] omap2_mcspi_set_dma_req(232): enter [ 9895.720528] omap2_mcspi_set_dma_req(247): exit [ 9895.725001] omap2_mcspi_tx_callback(459): exit [ 9895.743431] omap2_mcspi_set_dma_req(247): exit [ 9895.747902] omap2_mcspi_tx_dma(492): exit [ 9895.751924] omap2_mcspi_rx_dma(512): enter [ 9895.758134] omap2_mcspi_set_dma_req(232): enter [ 9895.765133] omap2_mcspi_set_dma_req(247): exit [ 9895.769599] mcspi_wait_for_completion(418): enter [ 9906.203277] omap2_mcspi_rx_callback(440): enter [ 9906.207876] omap2_mcspi_set_dma_req(232): enter [ 9906.212568] omap2_mcspi_set_dma_req(247): exit [ 9906.217114] omap2_mcspi_rx_callback(445): exit [ 9906.232642] omap2_mcspi_set_dma_req(232): enter [ 9906.238512] omap2_mcspi_set_dma_req(247): exit [ 9906.253163] omap2_mcspi_rx_dma(594): exit [ 9906.257206] mcspi_wait_for_completion(418): enter [ 9906.261934] omap2_mcspi_set_dma_req(232): enter [ 9906.275168] omap2_mcspi_set_dma_req(247): exit [ 9906.279637] omap2_mcspi_txrx_dma(723): exit [ 9906.290659] omap2_mcspi_setup_transfer(987): enter [ 9906.296541] omap2_mcspi_setup_transfer(1068): exit [ 9906.301357] omap2_mcspi_set_enable(255): enter -> enable = 0 [ 9906.313084] omap2_mcspi_set_enable(265): exit [ 9906.317466] omap2_mcspi_set_fifo(343): enter -> enable = 0 [ 9906.329195] omap2_mcspi_set_fifo(389): exit [ 9906.339782] omap2_mcspi_transfer_one(1367): exit [ 9906.345438] omap2_mcspi_set_cs(273): enter -> enable = 1 [ 9906.350821] omap2_mcspi_set_cs(304): exit [ 9906.361383] omap2_mcspi_can_dma(1404): enter [ 9906.366773] omap2_mcspi_can_dma(1415): exit [ 9906.371820] omap2_mcspi_prepare_message(1378): enter >>> The SPI slave **does** receive/transmit the 1st 32 byte buffer from/to the SPI master here 00000000 7c 25 73 69 f2 f7 67 7a 2e 49 89 1d 15 a3 82 14 ||%si..gz.I......| 00000010 b9 6a 9d 61 84 40 fb 65 92 93 95 47 74 eb 86 70 |.j.a.@xxxxxxx..p| >>> [ 9906.383655] omap2_mcspi_prepare_message(1396): exit [ 9906.392478] omap2_mcspi_can_dma(1404): enter [ 9906.400989] omap2_mcspi_can_dma(1415): exit [ 9906.406039] omap2_mcspi_set_cs(273): enter -> enable = 0 [ 9906.411665] omap2_mcspi_set_cs(304): exit [ 9906.422574] omap2_mcspi_transfer_one(1254): enter [ 9906.428205] omap2_mcspi_set_enable(255): enter -> enable = 0 [ 9906.434843] omap2_mcspi_set_enable(265): exit [ 9906.439219] omap2_mcspi_setup_transfer(987): enter [ 9906.448176] omap2_mcspi_setup_transfer(1068): exit [ 9906.460740] omap2_mcspi_can_dma(1404): enter [ 9906.471370] omap2_mcspi_can_dma(1415): exit [ 9906.475873] omap2_mcspi_set_fifo(343): enter -> enable = 1 [ 9906.481411] mcspi_bytes_per_word(214): enter [ 9906.488991] mcspi_bytes_per_word(223): exit [ 9906.494172] omap2_mcspi_set_fifo(376): exit [ 9906.498376] omap2_mcspi_set_enable(255): enter -> enable = 1 [ 9906.510844] omap2_mcspi_set_enable(265): exit [ 9906.518939] omap2_mcspi_can_dma(1404): enter [ 9906.524126] omap2_mcspi_can_dma(1415): exit [ 9906.528332] omap2_mcspi_txrx_dma(675): enter [ 9906.532636] omap2_mcspi_irq_handler(1203): enter [ 9906.537291] omap2_mcspi_irq_handler(1215): exit [ 9906.548202] omap2_mcspi_tx_dma(472): enter [ 9906.552342] omap2_mcspi_set_dma_req(232): enter [ 9906.558436] omap2_mcspi_set_dma_req(247): exit [ 9906.562903] omap2_mcspi_tx_dma(492): exit [ 9906.573147] omap2_mcspi_rx_dma(512): enter [ 9906.578230] omap2_mcspi_set_dma_req(232): enter [ 9906.582834] omap2_mcspi_rx_callback(440): enter [ 9906.587432] omap2_mcspi_set_dma_req(232): enter [ 9906.592017] omap2_mcspi_set_dma_req(247): exit [ 9906.596498] omap2_mcspi_rx_callback(445): exit [ 9906.615645] omap2_mcspi_set_dma_req(247): exit [ 9906.620115] mcspi_wait_for_completion(418): enter [ 9906.627340] omap2_mcspi_set_dma_req(232): enter [ 9906.631940] omap2_mcspi_set_dma_req(247): exit [ 9906.643142] omap2_mcspi_rx_dma(594): exit [ 9906.647177] mcspi_wait_for_completion(418): enter At this point it's waiting for a completion that appears to never come. > One possible reason may be that driver sets up MCSPI_XFERLEVEL WCNT and > enables channel before setting up TX DMA. So if there a master that is > continuously requesting data, then there is a possibility that WCNT goes > down to 0 (due to shifting of 0s) before DMA had chance to put actual > the data into the FIFO. > > Does it help if you move the programming of WCNT field to > omap2_mcspi_tx_dma()? > That sounds plausible and I *attempted* to move this code but I must've implemented it incorrectly because the driver stopped functioning. It's likely my implementation and/or interpretation of your suggestion is wrong. If you could provide a more explicit description of the changes or a patch I would certainly be willing to attempt it again. > Main intention of adding SPI slave support was to support limited > usecase where master and slave exchange very small known messages of > predetermined length such as to support at sparse interval (such as dead I was hoping to pass high-speed/bandwidth telemetry data from the SPI slave to the SPI master. Apparently my use case may not match the one described. With that said, I think it may still be achievable with tweaking by someone with more experience with the McSPI driver. I introduced a "usleep()" inside spi-pipe() just after the SPI transfer on the SPI master application. With a long enough delay between transfers (~2-10 msec) the SPI master/slave succeeds indefinitely and the SPI slave never gets "stuck". Unfortunately, this kills my telemetry throughput. So, there is likely some kind of race condition in the SPI slave controller code. Any additional suggestions or patches to try would be very appreciated. I think the potential to work reliably under load is there but it still needs a few changes. Hopefully these are small changes, localized to a single file, and don't require a big refactor.