Hi Vignesh, On Tue, Dec 10, 2019 at 9:40 AM Vignesh Raghavendra <vigneshr@xxxxxx> wrote: > > > > On 06/12/19 9:30 pm, Jean Pihet wrote: > > By reading the 32 bits data register and copy the contents to the > > receive buffer, according to the single/dual/quad read mode and > > the data length to read. > > > > The speed improvement is 3.5x using quad read. > > --- > > drivers/spi/spi-ti-qspi.c | 48 ++++++++++++++++++++++++++------------- > > 1 file changed, 32 insertions(+), 16 deletions(-) > > > > diff --git a/drivers/spi/spi-ti-qspi.c b/drivers/spi/spi-ti-qspi.c > > index 13916232a959..65ec3bcb25ae 100644 > > --- a/drivers/spi/spi-ti-qspi.c > > +++ b/drivers/spi/spi-ti-qspi.c > > @@ -313,24 +313,25 @@ static int qspi_write_msg(struct ti_qspi *qspi, struct spi_transfer *t, > > static int qspi_read_msg(struct ti_qspi *qspi, struct spi_transfer *t, > > int count) > > { > > - int wlen; > > unsigned int cmd; > > + u32 rx; > > u8 *rxbuf; > > + u8 xfer_len; > > > > rxbuf = t->rx_buf; > > cmd = qspi->cmd; > > + /* Optimize the transfers for dual and quad read */ > > switch (t->rx_nbits) { > > - case SPI_NBITS_DUAL: > > - cmd |= QSPI_RD_DUAL; > > - break; > > case SPI_NBITS_QUAD: > > - cmd |= QSPI_RD_QUAD; > > + cmd |= QSPI_RD_QUAD | QSPI_WLEN(32); > > Why does QUAD mode mean 32 bit words? This is based on the assumption that every transfer is multiple of 8 clock cycles. For SPI flash devices where bits_per_word is 8, the original code always reads data by 1 byte, which can be optimized. > > IO mode and word size are independent of each other. If you want to > optimize for speed, why not set FLEN field to max and WLEN to max and > use ti_qspi_adjust_op_size to clamp max read size to 4096 words when > required. This should work irrespective of IO modes. > Driver already does something similar to this in the write path. Ok! A new patch series is coming. > > > + break; > > + case SPI_NBITS_DUAL: > > + cmd |= QSPI_RD_DUAL | QSPI_WLEN(16); > > break; > > default: > > - cmd |= QSPI_RD_SNGL; > > + cmd |= QSPI_RD_SNGL | QSPI_WLEN(8); > > break; > > } > > - wlen = t->bits_per_word >> 3; /* in bytes */ > > > > while (count) { > > dev_dbg(qspi->dev, "rx cmd %08x dc %08x\n", cmd, qspi->dc); > > @@ -342,19 +343,34 @@ static int qspi_read_msg(struct ti_qspi *qspi, struct spi_transfer *t, > > dev_err(qspi->dev, "read timed out\n"); > > return -ETIMEDOUT; > > } > > - switch (wlen) { > > - case 1: > > - *rxbuf = readb(qspi->base + QSPI_SPI_DATA_REG); > > + > > + /* Optimize the transfers for dual and quad read */ > > + rx = readl(qspi->base + QSPI_SPI_DATA_REG); > > + switch (t->rx_nbits) { > > + case SPI_NBITS_QUAD: > > + if (count >= 1) > > + *rxbuf++ = rx >> 24; > > + if (count >= 2) > > + *rxbuf++ = rx >> 16; > > + if (count >= 3) > > + *rxbuf++ = rx >> 8; > > + if (count >= 4) > > + *rxbuf++ = rx; > > + xfer_len = min(count, 4); > > break; > > - case 2: > > - *((u16 *)rxbuf) = readw(qspi->base + QSPI_SPI_DATA_REG); > > + case SPI_NBITS_DUAL: > > + if (count >= 1) > > + *rxbuf++ = rx >> 8; > > + if (count >= 2) > > + *rxbuf++ = rx; > > + xfer_len = min(count, 2); > > break; > > - case 4: > > - *((u32 *)rxbuf) = readl(qspi->base + QSPI_SPI_DATA_REG); > > + default: > > + *rxbuf++ = rx; > > + xfer_len = 1; > > break; > > This will fail in case of t->rx_nbits = 1 and t->bits_per_word = 32 > (1 bit SPI bus bit slave with 32-bit addressable registers) I do not see why this would fail. If bits_per_word is 32, count (in bytes) is a multiple of 4 and 1 byte will be read every time the loop runs. Is that correct? Can you elaborate more on why this would fail? Thanks for reviewing, regards, Jean > > bits_per_word indicate the address granularity within SPI Slave > (represented by WLEN field in TI QSPI) and rx_nbits tells about buswidth > (bits received per bus clock) > > > } > > - rxbuf += wlen; > > - count -= wlen; > > + count -= xfer_len; > > } > > > > return 0; > > > > -- > Regards > Vignesh