On 23.07.2018 22:40, Rafał Miłecki wrote:
The problem is that I randomly see BSPI timeouts like this: [ 15.637809] bcm_iproc 18029200.spi: timeout waiting for BSPI [ 15.643471] jffs2: error: (430) check_node_data: can not read 305 bytes from 0x008a1264, error code: -110. [ 15.653118] jffs2: error: (430) check_tn_node: check_node_data() returned error: -110. [ 15.661262] jffs2: error: (430) jffs2_do_read_inode_internal: CRC failed for read_inode of inode 6967 at physical location 0x8a1420 [ 15.673065] jffs2: Returned error for crccheck of ino #6967. Expect badness... [ 15.997809] bcm_iproc 18029200.spi: timeout waiting for BSPI [ 16.003463] jffs2: error: (430) check_node_data: can not read 1840 bytes from 0x008abf04, error code: -110. [ 16.013192] jffs2: error: (430) check_tn_node: check_node_data() returned error: -110. [ 16.021882] jffs2: notice: (430) check_node_data: wrong data CRC in data node at 0x008ab81c: read 0x7b1aaf59, calculated 0xf66cce81. [ 16.173466] jffs2: error: (430) check_node_data: can not read 814 bytes from 0x0083da04, error code: -110. [ 16.183110] jffs2: error: (430) check_tn_node: check_node_data() returned error: -110. [ 16.191469] jffs2: notice: (430) check_node_data: wrong data CRC in data node at 0x0083d5e4: read 0xb9c58541, calculated 0xc4eede34. This of course causes filesystem stability problems. I spent a moment trying to debug this problem and it appears it didn't occur before the 345309fa7c0c ("spi: bcm-qspi: Fix bcm_qspi_bspi_read() performance"). Reverting all /recent/ bcm-qspi commits including above ones brings back the stability for me. Before above commit every time that bcm_qspi_bspi_lr_l2_isr() called bcm_qspi_bspi_lr_l2_isr() it was resulting in bspi_rf_msg_len becoming 0. With above commit it's not the case anymore so I suspect there may be some bug around that code.
An interesting, potentially useful info: it seems that adding any debugging to the bcm_qspi_bspi_lr_l2_isr() results in massive failures. I modified spi-bcm-qspi.c with attached patch and got: [ 0.762267] qspi->bspi_rf_msg_len:0x00e0 [ 0.866842] bcm_iproc 18029200.spi: timeout waiting for BSPI [ 0.872478] mtd_read error while parsing (offset: 0x0): -110 [ 0.878164] qspi->bspi_rf_msg_len:0x00e0 [ 0.986835] bcm_iproc 18029200.spi: timeout waiting for BSPI [ 0.992470] mtd_read error while parsing (offset: 0x10000): -110 [ 0.998471] bcm_iproc 18029200.spi: BSPI INT error [ 1.003263] qspi->bspi_rf_msg_len:0x00e0 [ 1.009891] qspi->bspi_rf_msg_len:0x00e0 [ 1.116836] bcm_iproc 18029200.spi: timeout waiting for BSPI [ 1.122466] mtd_read error while parsing (offset: 0x20000): -110 [ 1.128466] bcm_iproc 18029200.spi: BSPI INT error [ 1.133251] qspi->bspi_rf_msg_len:0x00e0 [ 1.139877] qspi->bspi_rf_msg_len:0x00e0 So my guess is this problem involves some code race. I don't see any other reason why adding a simple pr_info could trigger these problems. --- a/drivers/spi/spi-bcm-qspi.c +++ b/drivers/spi/spi-bcm-qspi.c @@ -1036,6 +1036,7 @@ static irqreturn_t bcm_qspi_bspi_lr_l2_isr(int irq, void *dev_id) if (qspi->bspi_enabled && qspi->bspi_rf_msg) { bcm_qspi_bspi_lr_data_read(qspi); + pr_info("qspi->bspi_rf_msg_len:0x%04x\n", qspi->bspi_rf_msg_len); if (qspi->bspi_rf_msg_len == 0) { qspi->bspi_rf_msg = NULL; if (qspi->soc_intc) { -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html