[RFC] spi: pxa2xx Speed optimizations for small transfers

<Thomas.Kopp@xxxxxxxxxxxxx> · Mon, 30 Oct 2023 07:58:41 +0000

Hi,

I'm looking over the spi-pxa2xx.c driver to get some more performance out of the SPI on an Intel Tiger Lake E processor. I have some questions about the history of the driver and how the HW works which will hopefully help to improve the driver.

reset_sccr1()
As far as I can see, this was changed significantly over time. But it's basically used in the same way.

Looking at 5.10 it was clearing out the interrupt enable bits and setting the rx/tx thresholds to values stored in chip->threshold.
	sccr1_reg = pxa2xx_spi_read(drv_data, SSCR1) & ~drv_data->int_cr1;
	sccr1_reg &= ~SSCR1_RFT;
	sccr1_reg |= chip->threshold;
	pxa2xx_spi_write(drv_data, SSCR1, sccr1_reg);

What it does now (6.6 rc7) is clear out the entire register with the exception of the treshold (which is passed as the value parameter to spi_update). Am I missing something or can the Read-Modify-Write operation never save a write here because reading the HW register with a mask of drv_data->int_cr1 | drv_data->dma_cr1 and comparing it to a threshold value will always yield false, right?
u32 mask = drv_data->int_cr1 | drv_data->dma_cr1
threshold is either chip->threshold or 0
pxa2xx_spi_update(drv_data, SSCR1, mask, threshold);

with the update function:

static void pxa2xx_spi_update(const struct driver_data *drv_data, u32 reg, u32 mask, u32 value)
{
	if ((pxa2xx_spi_read(drv_data, reg) & mask) != value)
		pxa2xx_spi_write(drv_data, reg, value & mask);
}

Am I missing something or could the reset routine be written in a similar fashion as it was in 5.10 but even omitting the read and always writing it. As far as I can see there are no status/control bits in that register that are modified by hardware so the driver should know the latest configuration and can write it to that while clearing the interrupt enable bits, right? The changes to include the special case with no message available can of course be handled as well.

In general: Can it be quantified for this driver how expensive reads and writes to registers are? The spi_update routine is written in a way to try to save writes if I interpretate that correctly, is that worth a read every time? From my understanding and quick tests these RMW operations on the SPI control registers seem to be expensive and doing write only can help speeding up the SPI transfers. One issue I have is that I currently don't have a good idea of how many different sets of HW this driver addresses. From the datasheets I could find there's a couple intel embedded platforms using this. But the datasheets I found are rather vague when it comes to determining what clock is being used to access registers etc. 

I implemented a polling mode in the same way as here: https://lore.kernel.org/linux-arm-kernel/20220502175457.1977983-9-mkl@xxxxxxxxxxxxxx/ 
Results on the Tiger Lake E hardware I have are promising, but the question is how to set limits for a mainline patch for this. Reading the commit message of 579d3bb it seems for some platforms it would be a terrible idea to have polling method. "On Sodavile the busy flag disappears pretty quick and after that it takes approx ~130ms (sometimes less but not much) until there are bytes available in the RX FIFO." What would be the best way to handle this? Use an aggressive limit for the timeout and implement it like it's implemented on the imx driver and fallback to PIO? Blacklisting of specific platforms for this feature?

Thanks,
Thomas