Re: bcm-qspi: unstable bspi mode since the 345309fa7c0c ("spi: bcm-qspi: Fix bcm_qspi_bspi_read() performance")

Rafał Miłecki <zajec5@xxxxxxxxx> · Mon, 23 Jul 2018 22:47:59 +0200

On 23.07.2018 22:40, Rafał Miłecki wrote:
The problem is that I randomly see BSPI timeouts like this:

[   15.637809] bcm_iproc 18029200.spi: timeout waiting for BSPI
[   15.643471] jffs2: error: (430) check_node_data: can not read 305 bytes from 0x008a1264, error code: -110.
[   15.653118] jffs2: error: (430) check_tn_node: check_node_data() returned error: -110.
[   15.661262] jffs2: error: (430) jffs2_do_read_inode_internal: CRC failed for read_inode of inode 6967 at physical location 0x8a1420
[   15.673065] jffs2: Returned error for crccheck of ino #6967. Expect badness...

[   15.997809] bcm_iproc 18029200.spi: timeout waiting for BSPI
[   16.003463] jffs2: error: (430) check_node_data: can not read 1840 bytes from 0x008abf04, error code: -110.
[   16.013192] jffs2: error: (430) check_tn_node: check_node_data() returned error: -110.
[   16.021882] jffs2: notice: (430) check_node_data: wrong data CRC in data node at 0x008ab81c: read 0x7b1aaf59, calculated 0xf66cce81.

[   16.173466] jffs2: error: (430) check_node_data: can not read 814 bytes from 0x0083da04, error code: -110.
[   16.183110] jffs2: error: (430) check_tn_node: check_node_data() returned error: -110.
[   16.191469] jffs2: notice: (430) check_node_data: wrong data CRC in data node at 0x0083d5e4: read 0xb9c58541, calculated 0xc4eede34.

This of course causes filesystem stability problems.

I spent a moment trying to debug this problem and it appears it didn't
occur before the 345309fa7c0c ("spi: bcm-qspi: Fix bcm_qspi_bspi_read()
performance"). Reverting all /recent/ bcm-qspi commits including above
ones brings back the stability for me.

Before above commit every time that bcm_qspi_bspi_lr_l2_isr() called
bcm_qspi_bspi_lr_l2_isr() it was resulting in bspi_rf_msg_len becoming
0.
With above commit it's not the case anymore so I suspect there may be
some bug around that code.

An interesting, potentially useful info: it seems that adding any debugging
to the bcm_qspi_bspi_lr_l2_isr() results in massive failures.

I modified spi-bcm-qspi.c with attached patch and got:
[    0.762267] qspi->bspi_rf_msg_len:0x00e0
[    0.866842] bcm_iproc 18029200.spi: timeout waiting for BSPI
[    0.872478] mtd_read error while parsing (offset: 0x0): -110
[    0.878164] qspi->bspi_rf_msg_len:0x00e0
[    0.986835] bcm_iproc 18029200.spi: timeout waiting for BSPI
[    0.992470] mtd_read error while parsing (offset: 0x10000): -110
[    0.998471] bcm_iproc 18029200.spi: BSPI INT error
[    1.003263] qspi->bspi_rf_msg_len:0x00e0
[    1.009891] qspi->bspi_rf_msg_len:0x00e0
[    1.116836] bcm_iproc 18029200.spi: timeout waiting for BSPI
[    1.122466] mtd_read error while parsing (offset: 0x20000): -110
[    1.128466] bcm_iproc 18029200.spi: BSPI INT error
[    1.133251] qspi->bspi_rf_msg_len:0x00e0
[    1.139877] qspi->bspi_rf_msg_len:0x00e0

So my guess is this problem involves some code race. I don't see any
other reason why adding a simple pr_info could trigger these problems.

--- a/drivers/spi/spi-bcm-qspi.c
+++ b/drivers/spi/spi-bcm-qspi.c
@@ -1036,6 +1036,7 @@ static irqreturn_t bcm_qspi_bspi_lr_l2_isr(int irq, void *dev_id)

 	if (qspi->bspi_enabled && qspi->bspi_rf_msg) {
 		bcm_qspi_bspi_lr_data_read(qspi);
+		pr_info("qspi->bspi_rf_msg_len:0x%04x\n", qspi->bspi_rf_msg_len);
 		if (qspi->bspi_rf_msg_len == 0) {
 			qspi->bspi_rf_msg = NULL;
 			if (qspi->soc_intc) {
--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html