Hi Serge, > On Fri, Mar 10, 2023 at 10:31:51AM -0500, Jack Chen wrote: > > Delay between write and read in polling mode is not necessary in dw spi > > driver. It was added assuming that dw spi controller need the delay to > > send data from tx fifo to spi devices. But it is not needed because > > following reasons: > > 1) dw spi datasheet claims transfer begins when first data word is > > present in the transmit FIFO and a slave is enabled. So at least we > > do not need the full fifo-size-transfer time delay. > > 2) in practice, due to spi devices implementation, spi full-duplex > > (write and read real data) is always split into two transfers. > In practice the delay is specifically added to minimize the dummy > loops in the poll-based transfer. It's calculated based on the number > of bytes pushed to the Tx FIFO and the SPI-bus clock rate (that's why > the spi_transfer.effective_speed_hz field is initialized in the > driver). So after all of them are transferred we get to start reading > data from the Rx FIFO. Until then the kernel thread is supposed to > sleep giving up the CPU for another tasks. Thanks so much for your feedback. I understand the purpose of the specifically calculated delay now. However, whether freeing cpu to other threads actually depends on the size of delay. If the delay is smaller than 10 us, normally it will cause busy-looping in cpu instead of freeing it. And the delay does not work in all cases. For example: if I am running the spi at 20M with a fifo size to be 8, and transfering a huge chunk of data (4096 bytes) in one transfer, based on the delay calculation, it would add a 3200 ns delay between each sub-transfer, which is transformed to 4us delay and in most cases on most platforms, udelay is not precise enough and I measured >= 5 us delay in most cases on my platform. So at least 1.8 us extra delay is added. Considering the time to fill tx_fifo, let's round it to 2us. The actual time needed to transfer 8 bytes at 20M speed is just 3.2 us but we added an extra delay of 2 us on average. When we consider the whole chunk of data (4096 bytes) in the whole transfer, we added more than 1 ms delay. This extra delay is long enough to fail a big chunk of data transfer applications ( e.g. image, audio.). To overcome the extra delay, maybe we can consider the following two proposals: 1) add a node in dts and allow users to enable the delay in polling mode. 2) Let's compare the needed delay time (bytes to transfer in tx fifo) to 10 us, and only call spi_delay_exec when the delay is bigger than 10 us. Since When the delay is smaller than 10 us, short delay calls (ndelay/udelay) are just busy-loops, even calling delay won't freeing cpu to other tasks. What is your opinion? Thanks Jack Chen On Fri, Mar 10, 2023 at 9:23 PM Serge Semin <fancer.lancer@xxxxxxxxx> wrote: > > Hi Jack > > On Fri, Mar 10, 2023 at 10:31:51AM -0500, Jack Chen wrote: > > Delay between write and read in polling mode is not necessary in dw spi > > driver. It was added assuming that dw spi controller need the delay to > > send data from tx fifo to spi devices. But it is not needed because > > following reasons: > > 1) dw spi datasheet claims transfer begins when first data word is > > present in the transmit FIFO and a slave is enabled. So at least we > > do not need the full fifo-size-transfer time delay. > > 2) in practice, due to spi devices implementation, spi full-duplex > > (write and read real data) is always split into two transfers. > > In practice the delay is specifically added to minimize the dummy > loops in the poll-based transfer. It's calculated based on the number > of bytes pushed to the Tx FIFO and the SPI-bus clock rate (that's why > the spi_transfer.effective_speed_hz field is initialized in the > driver). So after all of them are transferred we get to start reading > data from the Rx FIFO. Until then the kernel thread is supposed to > sleep giving up the CPU for another tasks. > > > Delay between spi transfers may be needed. But this can be introduced by > > using a more formal helper function "spi_transfer_delay_exec", in which > > the delay time is passed by users through spi_ioc_transfer. > > This is wrong. spi_transfer.delay is supposed to be executed after the > whole transfer is completed. You suggest to to do that in between some > random data chunks pushed and pulled from the controller FIFO. > Moreover that delay is already performed by the SPI-core: > https://elixir.bootlin.com/linux/latest/source/drivers/spi/spi.c#L1570 > > -Serge(y) > > > > > Signed-off-by: Jack Chen <zenghuchen@xxxxxxxxxx> > > --- > > drivers/spi/spi-dw-core.c | 20 +++++++------------- > > 1 file changed, 7 insertions(+), 13 deletions(-) > > > > diff --git a/drivers/spi/spi-dw-core.c b/drivers/spi/spi-dw-core.c > > index c3bfb6c84cab..7c10fb353567 100644 > > --- a/drivers/spi/spi-dw-core.c > > +++ b/drivers/spi/spi-dw-core.c > > @@ -379,9 +379,12 @@ static void dw_spi_irq_setup(struct dw_spi *dws) > > > > /* > > * The iterative procedure of the poll-based transfer is simple: write as much > > - * as possible to the Tx FIFO, wait until the pending to receive data is ready > > - * to be read, read it from the Rx FIFO and check whether the performed > > - * procedure has been successful. > > + * as possible to the Tx FIFO, then read from the Rx FIFO and check whether the > > + * performed procedure has been successful. > > + * > > + * Delay is introduced in the end of each transfer before (optionally) changing > > + * the chipselect status, then starting the next transfer or completing the > > + * list of @spi_message. > > * > > * Note this method the same way as the IRQ-based transfer won't work well for > > * the SPI devices connected to the controller with native CS due to the > > @@ -390,21 +393,12 @@ static void dw_spi_irq_setup(struct dw_spi *dws) > > static int dw_spi_poll_transfer(struct dw_spi *dws, > > struct spi_transfer *transfer) > > { > > - struct spi_delay delay; > > - u16 nbits; > > int ret; > > > > - delay.unit = SPI_DELAY_UNIT_SCK; > > - nbits = dws->n_bytes * BITS_PER_BYTE; > > - > > do { > > dw_writer(dws); > > - > > - delay.value = nbits * (dws->rx_len - dws->tx_len); > > - spi_delay_exec(&delay, transfer); > > - > > dw_reader(dws); > > - > > + spi_transfer_delay_exec(transfer); > > ret = dw_spi_check_status(dws, true); > > if (ret) > > return ret; > > -- > > 2.40.0.rc1.284.g88254d51c5-goog > >