Re: [PATCH 3/3] TI QSPI: optimize transfers for dual and quad read

Vignesh Raghavendra <vigneshr@xxxxxx> · Tue, 10 Dec 2019 18:26:03 +0530

On 10/12/19 5:55 pm, Jean Pihet wrote:
> Hi Vignesh,
> 
> On Tue, Dec 10, 2019 at 9:40 AM Vignesh Raghavendra <vigneshr@xxxxxx> wrote:
>>
>>
>>
>> On 06/12/19 9:30 pm, Jean Pihet wrote:
>>> By reading the 32 bits data register and copy the contents to the
>>> receive buffer, according to the single/dual/quad read mode and
>>> the data length to read.
>>>
>>> The speed improvement is 3.5x using quad read.
>>> ---
>>>  drivers/spi/spi-ti-qspi.c | 48 ++++++++++++++++++++++++++-------------
>>>  1 file changed, 32 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/drivers/spi/spi-ti-qspi.c b/drivers/spi/spi-ti-qspi.c
>>> index 13916232a959..65ec3bcb25ae 100644
>>> --- a/drivers/spi/spi-ti-qspi.c
>>> +++ b/drivers/spi/spi-ti-qspi.c
>>> @@ -313,24 +313,25 @@ static int qspi_write_msg(struct ti_qspi *qspi, struct spi_transfer *t,
>>>  static int qspi_read_msg(struct ti_qspi *qspi, struct spi_transfer *t,
>>>                        int count)
>>>  {
>>> -     int wlen;
>>>       unsigned int cmd;
>>> +     u32 rx;
>>>       u8 *rxbuf;
>>> +     u8 xfer_len;
>>>
>>>       rxbuf = t->rx_buf;
>>>       cmd = qspi->cmd;
>>> +     /* Optimize the transfers for dual and quad read */
>>>       switch (t->rx_nbits) {
>>> -     case SPI_NBITS_DUAL:
>>> -             cmd |= QSPI_RD_DUAL;
>>> -             break;
>>>       case SPI_NBITS_QUAD:
>>> -             cmd |= QSPI_RD_QUAD;
>>> +             cmd |= QSPI_RD_QUAD | QSPI_WLEN(32);
>>
>> Why does QUAD mode mean 32 bit words?
> This is based on the assumption that every transfer is multiple of 8
> clock cycles.
> For SPI flash devices where bits_per_word is 8, the original code
> always reads data by 1 byte, which
> can be optimized.
> 
>>
>> IO mode and word size are independent of each other. If you want to
>> optimize for speed, why not set FLEN field to max and WLEN to max and
>> use ti_qspi_adjust_op_size to clamp max read size to 4096 words when
>> required. This should work irrespective of IO modes.
>> Driver already does something similar to this in the write path.
> 
> Ok! A new patch series is coming.
> 
>>
>>> +             break;
>>> +     case SPI_NBITS_DUAL:
>>> +             cmd |= QSPI_RD_DUAL | QSPI_WLEN(16);
>>>               break;
>>>       default:
>>> -             cmd |= QSPI_RD_SNGL;
>>> +             cmd |= QSPI_RD_SNGL | QSPI_WLEN(8);
>>>               break;
>>>       }
>>> -     wlen = t->bits_per_word >> 3;   /* in bytes */
>>>
>>>       while (count) {
>>>               dev_dbg(qspi->dev, "rx cmd %08x dc %08x\n", cmd, qspi->dc);
>>> @@ -342,19 +343,34 @@ static int qspi_read_msg(struct ti_qspi *qspi, struct spi_transfer *t,
>>>                       dev_err(qspi->dev, "read timed out\n");
>>>                       return -ETIMEDOUT;
>>>               }
>>> -             switch (wlen) {
>>> -             case 1:
>>> -                     *rxbuf = readb(qspi->base + QSPI_SPI_DATA_REG);
>>> +
>>> +             /* Optimize the transfers for dual and quad read */
>>> +             rx = readl(qspi->base + QSPI_SPI_DATA_REG);
>>> +             switch (t->rx_nbits) {
>>> +             case SPI_NBITS_QUAD:
>>> +                     if (count >= 1)
>>> +                             *rxbuf++ = rx >> 24;
>>> +                     if (count >= 2)
>>> +                             *rxbuf++ = rx >> 16;
>>> +                     if (count >= 3)
>>> +                             *rxbuf++ = rx >> 8;
>>> +                     if (count >= 4)
>>> +                             *rxbuf++ = rx;
>>> +                     xfer_len = min(count, 4);
>>>                       break;
>>> -             case 2:
>>> -                     *((u16 *)rxbuf) = readw(qspi->base + QSPI_SPI_DATA_REG);
>>> +             case SPI_NBITS_DUAL:
>>> +                     if (count >= 1)
>>> +                             *rxbuf++ = rx >> 8;
>>> +                     if (count >= 2)
>>> +                             *rxbuf++ = rx;
>>> +                     xfer_len = min(count, 2);
>>>                       break;
>>> -             case 4:
>>> -                     *((u32 *)rxbuf) = readl(qspi->base + QSPI_SPI_DATA_REG);
>>> +             default:
>>> +                     *rxbuf++ = rx;
>>> +                     xfer_len = 1;
>>>                       break;
>>
>> This will fail in case of t->rx_nbits = 1 and t->bits_per_word = 32
>> (1 bit SPI bus bit slave with 32-bit addressable registers)
> I do not see why this would fail. If bits_per_word is 32, count (in
> bytes) is a multiple of 4 and 1 byte will be read every time the loop
> runs.
> Is that correct? Can you elaborate more on why this would fail?
> 

With t->rx_nbits = 1 and t->bits_per_word = 32, controller should always
read in 4 byte chunks on the SPI bus, but we have:

>>> +             cmd |= QSPI_RD_SNGL | QSPI_WLEN(8);

which sets word size to 8 bits and thus breaks the transaction to 1 byte
chunks on the bus, which is bad.

Regards
Vignesh

> Thanks for reviewing, regards,
> Jean
> 
>>
>> bits_per_word indicate the address granularity within SPI Slave
>> (represented by WLEN field in TI QSPI) and rx_nbits tells about buswidth
>> (bits received per bus clock)
>>
>>>               }
>>> -             rxbuf += wlen;
>>> -             count -= wlen;
>>> +             count -= xfer_len;
>>>       }
>>>
>>>       return 0;
>>>
>>
>> --
>> Regards
>> Vignesh

-- 
Regards
Vignesh