Re: [PATCH 04/32] spi: dw: Introduce polling device tree property

Damien Le Moal <Damien.LeMoal@xxxxxxx> · Mon, 16 Nov 2020 07:47:47 +0000

On 2020/11/16 1:02, Serge Semin wrote:
> On Fri, Nov 13, 2020 at 09:22:54AM +0000, Damien Le Moal wrote:
>> On Mon, 2020-11-09 at 22:59 +0300, Serge Semin wrote:
>>> On Sat, Nov 07, 2020 at 05:13:52PM +0900, Damien Le Moal wrote:
>>>> With boards that have slow interrupts context switch, and a fast device
>>>> connected to a spi master, e.g. an SD card through mmc-spi,
>>>
>>>> using
>>>> dw_spi_poll_transfer() intead of the regular interrupt based
>>>> dw_spi_transfer_handler() function is more efficient and
>>>
>>> I can believe in that. But the next part seems questionable:
>>>
>>>> can avoid a lot
>>>> of RX FIFO overflow errors while keeping the device SPI frequency
>>>> reasonnably high (for speed).
>>>
>>> No matter whether it's an IRQ-based or poll-based transfer, as long as a
>>> client SPI-device is connected with a GPIO-based chip-select (or the
>>> DW APB SSI-controller feature of the automatic chip-select toggling is
>>> fixed), the Rx FIFO should never overrun. It's ensured by the transfer
>>> algorithm design by calculating the rxtx_gap in the dw_writer()
>>> method. If the error still happens then there must be some bug in
>>> the code.
>>>
>>> It's also strange to hear that the polling-based transfer helps
>>> to avoid the Rx FIFO overflow errors, while the IRQ-based transfer
>>> causes them. Both of those methods are based on the same dw_writer()
>>> and dw_reader() methods. So basically they both should either work
>>> well or cause the errors at same time.
>>>
>>> So to speak could you more through debug your case?
>>
>> I did. And I have much better results now. Let me explain:
> 
>> 1) The device tree was setting up the SPI controller using the controller
>> internal chip select, not a GPIO-based chip select. Until now, I could never
>> get the GPIO-based chip select to work. I finally found out why: I simply
>> needed to add the "spi-cs-high" property to the mmc-slot node. With that, the
>> CS gpio is correctly driven active-high instead of the default active-low and
>> the SD card works.
> 
> Yeap, that's the main problem of the standard DW APB SSI controller
> released by Synopsys. Currently SPI-flash'es are the only SPI
> peripheral devices which are capable to work with native DW APB SSI
> chip-selects. I've fixed that for flashes recently by atomizing the
> SPI memory operations (executing them with the local IRQs disabled)
> and performing them on the poll-based basis. Alas the normal
> SPI-transfers still can't be properly executed with native
> chip-selects.
> 
>> 2) With this change using the GPIO-based CS, the patch "spi: dw: Fix driving
>> MOSI low while receiving" became completely unnecessary. The SD card works
>> without it.
> 
> Hm, that's weird. MOSI level has nothing todo with the chip-selects.
> Are you sure the original problem has been connected with the Tx lane
> level? On the other hand are you sure the problem has gone away after
> all?

Not sure of anything here :) But removing the patch changing MOSI level did not
prevent the SD card to be scanned & accessed correctly. Before (with native chip
select), the first command would not even complete...

> Moreover I've just taken alook at the MMC SPI driver. Turns out it
> has already been fixed to send ones to the MMC port when it's
> required. So If you still experience the MOSI-level problem
> then that fix might have been done incorrect at some extent...

OK. Thanks for the info. I need to rebase on the latest SPI tree then. However,
scripts/get_maintainer.pl does not mention any. Where is that hosted ?

>> Now for testing, I also removed this polling change. Results are these:
>> 1) With the same SPI frequency as before (4MHz), I can run the SD card at about
>> 300 KB/s (read) but I am still seeing some RX FIFO overflow errors. Not a lot,
>> but enough to be annoying, especially on boot as the partition scan sometimes
>> fails because of these errors. In most cases, the block layer retry of failed
>> read/writes cover and no bad errors happen, but the RX FIFO overflow error
>> messages still pop up.
>> 2) Looking into the code further, I realized that RXFLTR is set to half the
>> fifo size minus 1. That sound reasonable, but as that delays interrupt
>> generation until the RX fifo is almost full, I decided to try a value of 0 to
>> get the interrupt as soon as data is available rather than waiting for a chunk.
>> With that, all RX FIFO overflow errors go away, and I could even double the SPI
>> frequency to 8MHz, getting a solid 650KB/s from the SD card without any error
>> at all.
>>
> 
> Please, see my last comment...
> 
>> My take:
>> * This controller internal CS seems to be totally broken.
> 
> Native CS aren't broken, they just have been designed so it isn't that
> easy to use them in the linux kernel. Linux kernel SPI-core expects
> the chip-select being driven the way it needs at the certain moments,
> while DW APB SSI toggles them automatically at the moment of the data
> pushing/popping to/from the SPI bus. Some hardware vendors that bought
> the DW APB SSI IP-core have fixed that by providing a way of direct
> CS lane control (see spi-dw-mmio.c: Microsemi, Sparx5, Alpine
> platforms).

OK. Thanks for the details.

>> * This SoC has really slow interrupts, so generating these earlier rather than
>> later gives time to the IRQ handler to kick in before the FIFO overflows.
> 
> I am pretty sure that's not the reason. See my next comment.
> 
>>
>> In the V2 series for SPI DW, I added a DW_SPI_CAP_RXFLTR_CLEAR capability flag
>> to set RXFLTR to 0, always. That works well, but this is may be still similar
>> to the "polling" hack in the sense that it is tuning for this SoC rather than a
>> property of the controller. But I do not see any other simple way of removing
>> these annoying RX FIFO overflow errors.
> 
> Alas no, RX-FIFO level value shouldn't affect the SPI-transfers
> execution results. I'll try to explain it one more time. DW APB SSI
> provides three modes of transfers (TMOD-es, see the chip manual for
> details). One of them is TMOD=0x0 - Transmit and Receive. Simply
> speaking the mode essence is if you push a byte of data to the Tx
> FIFO you'll synchronously get a byte of data back in the Rx FIFO a bit
> later from the internal shift register. The IRQ-based transfers have
> been developed full based on that mode. So it's absolutely possible to
> implement a stable algorithm, which would work without a risk of
> getting the Rx FIFO overflow or the Tx FIFO overrun. Such algorithm
> should just preserve a balance of sent data so the received data
> wouldn't cause the Rx FIFO overrun. As you understand such algorithm
> doesn't depend on the IRQes performance. No matter how slow the IRQs
> handler is execute, as long as it's executed, the SPI transfers
> procedure shall go on with no errors including the Rx FIFO overflows.

OK. I get it now.

> At least that's what the original DW APB SSI driver developer
> implied when created the code and what I was trying to preserve in my
> patches. If you still get the errors (you sure it's Rx FIFO overflows,
> aren't you?), then something went wrong and either me or the original
> author or someone else has broken the code.

Yep, 100% sure it is RX FIFO overflow error. The bit in the IRQ status is set.
Checked it. And that is the trigger for the error message which exactly says
that "RX FIFO overflow". So that would mean that too many bytes are being
written to the TX FIFO, right ? Or rather, that when the TX FIFO is written, it
writes more bytes than what is available in the RX FIFO ?

If that is the case, then it seems to me that tx_max() should be modified a
little: rx_len may be large (say a sector), in which case rxtx_gap overflows
(becomes super large), with the result that tx_max() returns tx_room. But that
may be larger than the free space in the RX FIFO, no ?

I think I need to do a round of debug again, tracing these values to figure out
what is going on. I can reproduce the errors very easily (I just need to crank
up the SPI frequency).

> What we currently need is to carefully analyze at what moment you get
> the overflows, what really caused them and fix the problem in the
> code. The capabilities flag isn't an option in this case. I've tried
> to reproduce the errors by slowing the system down, executing the
> SPI-transfers of various length, but with no lick. So it's only you
> for now who observes and can try to debug the problem.

OK. Will do.

> Anyway after we get to fix it, you'll be able to run the MMC port with
> speed as fast as it's possible.

That is my current goal :)

> 
> ---
> One more thing about MMC SPI. Please note, that using MMC SPI and
> DMA-based DW APB SSI transfers might not work well. The thing is that
> the DMA-part of the MMC SPI driver relies on the legacy
> spi_message->is_dma_mapped interface. It performs some specific
> DEV-to-MEM and MEM-to-DEV DMA-buffers splitting and synchronizations.
> In its turn the DMA-part of the DW APB SSI driver uses the generic
> SPI-core and the DMA-engine interface to execute the SPI-transfers.
> Since the SPI-core performs the DMA buffers synchronizations
> internally at the moment of the buffers mapping, in case if the DMA
> accesses aren't coherent on your platform, the data buffers might get
> synchronized/flushed with/by caches incorrectly corrupting a part of
> actual data. At least that's what we observed in kernel 4.9... So for
> now I'd suggest to use either the poll- or the IRQ-based DW APB SSI
> transfers.

The K210 has a DW DMA engine, but I do not have it enabled. I tried that too to
be frank, but it fails to initialize the DMA rx/tx channels. I have not dig into
that one since DMA is not at all required for this board given the devices on it.

Thanks for all the help !

-- 
Damien Le Moal
Western Digital Research