Re: [PATCH 3/3] TI QSPI: optimize transfers for dual and quad read

Jean Pihet <jean.pihet@xxxxxxxxxxxxxx> · Tue, 10 Dec 2019 13:25:56 +0100

Hi Vignesh,

On Tue, Dec 10, 2019 at 9:40 AM Vignesh Raghavendra <vigneshr@xxxxxx> wrote:
>
>
>
> On 06/12/19 9:30 pm, Jean Pihet wrote:
> > By reading the 32 bits data register and copy the contents to the
> > receive buffer, according to the single/dual/quad read mode and
> > the data length to read.
> >
> > The speed improvement is 3.5x using quad read.
> > ---
> >  drivers/spi/spi-ti-qspi.c | 48 ++++++++++++++++++++++++++-------------
> >  1 file changed, 32 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/spi/spi-ti-qspi.c b/drivers/spi/spi-ti-qspi.c
> > index 13916232a959..65ec3bcb25ae 100644
> > --- a/drivers/spi/spi-ti-qspi.c
> > +++ b/drivers/spi/spi-ti-qspi.c
> > @@ -313,24 +313,25 @@ static int qspi_write_msg(struct ti_qspi *qspi, struct spi_transfer *t,
> >  static int qspi_read_msg(struct ti_qspi *qspi, struct spi_transfer *t,
> >                        int count)
> >  {
> > -     int wlen;
> >       unsigned int cmd;
> > +     u32 rx;
> >       u8 *rxbuf;
> > +     u8 xfer_len;
> >
> >       rxbuf = t->rx_buf;
> >       cmd = qspi->cmd;
> > +     /* Optimize the transfers for dual and quad read */
> >       switch (t->rx_nbits) {
> > -     case SPI_NBITS_DUAL:
> > -             cmd |= QSPI_RD_DUAL;
> > -             break;
> >       case SPI_NBITS_QUAD:
> > -             cmd |= QSPI_RD_QUAD;
> > +             cmd |= QSPI_RD_QUAD | QSPI_WLEN(32);
>
> Why does QUAD mode mean 32 bit words?
This is based on the assumption that every transfer is multiple of 8
clock cycles.
For SPI flash devices where bits_per_word is 8, the original code
always reads data by 1 byte, which
can be optimized.

>
> IO mode and word size are independent of each other. If you want to
> optimize for speed, why not set FLEN field to max and WLEN to max and
> use ti_qspi_adjust_op_size to clamp max read size to 4096 words when
> required. This should work irrespective of IO modes.
> Driver already does something similar to this in the write path.

Ok! A new patch series is coming.

>
> > +             break;
> > +     case SPI_NBITS_DUAL:
> > +             cmd |= QSPI_RD_DUAL | QSPI_WLEN(16);
> >               break;
> >       default:
> > -             cmd |= QSPI_RD_SNGL;
> > +             cmd |= QSPI_RD_SNGL | QSPI_WLEN(8);
> >               break;
> >       }
> > -     wlen = t->bits_per_word >> 3;   /* in bytes */
> >
> >       while (count) {
> >               dev_dbg(qspi->dev, "rx cmd %08x dc %08x\n", cmd, qspi->dc);
> > @@ -342,19 +343,34 @@ static int qspi_read_msg(struct ti_qspi *qspi, struct spi_transfer *t,
> >                       dev_err(qspi->dev, "read timed out\n");
> >                       return -ETIMEDOUT;
> >               }
> > -             switch (wlen) {
> > -             case 1:
> > -                     *rxbuf = readb(qspi->base + QSPI_SPI_DATA_REG);
> > +
> > +             /* Optimize the transfers for dual and quad read */
> > +             rx = readl(qspi->base + QSPI_SPI_DATA_REG);
> > +             switch (t->rx_nbits) {
> > +             case SPI_NBITS_QUAD:
> > +                     if (count >= 1)
> > +                             *rxbuf++ = rx >> 24;
> > +                     if (count >= 2)
> > +                             *rxbuf++ = rx >> 16;
> > +                     if (count >= 3)
> > +                             *rxbuf++ = rx >> 8;
> > +                     if (count >= 4)
> > +                             *rxbuf++ = rx;
> > +                     xfer_len = min(count, 4);
> >                       break;
> > -             case 2:
> > -                     *((u16 *)rxbuf) = readw(qspi->base + QSPI_SPI_DATA_REG);
> > +             case SPI_NBITS_DUAL:
> > +                     if (count >= 1)
> > +                             *rxbuf++ = rx >> 8;
> > +                     if (count >= 2)
> > +                             *rxbuf++ = rx;
> > +                     xfer_len = min(count, 2);
> >                       break;
> > -             case 4:
> > -                     *((u32 *)rxbuf) = readl(qspi->base + QSPI_SPI_DATA_REG);
> > +             default:
> > +                     *rxbuf++ = rx;
> > +                     xfer_len = 1;
> >                       break;
>
> This will fail in case of t->rx_nbits = 1 and t->bits_per_word = 32
> (1 bit SPI bus bit slave with 32-bit addressable registers)
I do not see why this would fail. If bits_per_word is 32, count (in
bytes) is a multiple of 4 and 1 byte will be read every time the loop
runs.
Is that correct? Can you elaborate more on why this would fail?

Thanks for reviewing, regards,
Jean

>
> bits_per_word indicate the address granularity within SPI Slave
> (represented by WLEN field in TI QSPI) and rx_nbits tells about buswidth
> (bits received per bus clock)
>
> >               }
> > -             rxbuf += wlen;
> > -             count -= wlen;
> > +             count -= xfer_len;
> >       }
> >
> >       return 0;
> >
>
> --
> Regards
> Vignesh