On Thu, 27 Sep 2018 21:55:57 +0000 Chris Packham <Chris.Packham@xxxxxxxxxxxxxxxxxxx> wrote: > Hi All, > > On 27/09/18 20:56, Boris Brezillon wrote: > > On Thu, 27 Sep 2018 10:11:45 +0200 > > Miquel Raynal <miquel.raynal@xxxxxxxxxxx> wrote: > > > >> Hi Daniel, > >> > >> Daniel Mack <daniel@xxxxxxxxxx> wrote on Thu, 27 Sep 2018 09:17:51 > >> +0200: > >> > >>> At least on PXA3xx platforms, enabling RDY interrupts in the NDCR register > >>> will only cause the IRQ to latch when the RDY lanes are changing, and not > >>> in case they are already asserted. > >>> > >>> This means that if the controller finished the command in flight before > >>> marvell_nfc_wait_op() is called, that function will wait for a change in > >>> the bit that can't ever happen as it is already set. > >>> > >>> To address this race, check for the RDY bits after the IRQ was enabled, > >>> and complete the completion immediately if the condition is already met. > >>> > >>> This fixes a bug that was observed with a NAND chip that holds a UBIFS > >>> parition on which file system stress tests were executed. When > >>> marvell_nfc_wait_op() reports an error, UBI/UBIFS will eventually mount > >>> the filesystem read-only, reporting lots of warnings along the way. > >>> > >>> Fixes: 02f26ecf8c77 mtd: nand: add reworked Marvell NAND controller driver > >>> Cc: stable@xxxxxxxxxxxxxxx > >>> Signed-off-by: Daniel Mack <daniel@xxxxxxxxxx> > >>> --- > >> > >> Sorry I haven't had the time to check on my Armada, but you figured it > >> out, and the fix looks good to me! > >> > >> Acked-by: Miquel Raynal <miquel.raynal@xxxxxxxxxxx> > >> > >> Boris, do you plan to send another fixes PR of can I take it into > >> the nand/next branch? > > > > Queued to mtd/master. > > After fixing my R/B configuration I get a new error with this patch when > running stress_1 from mtd-utils-2.0.0. I don't see this without the patch. > > My board is a custom design using an Armada-385 SoC with Macronix NAND. > > # stress_1 > marvell-nfc f10d0000.nand-controller: Timeout on RDDREQ/WRDREQ while > draining raw data (NDSR: 0x00000000) > ubi0 error: ubi_io_write: error -5 while writing 4096 bytes to PEB > 1858:110592, written 0 bytes > CPU: 1 PID: 1170 Comm: stress_1 Not tainted 4.19.0-rc5-at1+ #8 > Hardware name: Marvell Armada 380/385 (Device Tree) > [<8011143c>] (unwind_backtrace) from [<8010c17c>] (show_stack+0x10/0x14) > [<8010c17c>] (show_stack) from [<805ec28c>] (dump_stack+0x88/0x9c) > [<805ec28c>] (dump_stack) from [<80418a28>] (ubi_io_write+0x55c/0x6c0) > [<80418a28>] (ubi_io_write) from [<80415b4c>] (ubi_eba_write_leb+0x80/0x780) > [<80415b4c>] (ubi_eba_write_leb) from [<80414580>] (ubi_leb_write+0xbc/0xe0) > [<80414580>] (ubi_leb_write) from [<802d46b4>] (ubifs_leb_write+0xa0/0x118) > [<802d46b4>] (ubifs_leb_write) from [<802d5620>] > (ubifs_wbuf_write_nolock+0x184/0x6ac) > [<802d5620>] (ubifs_wbuf_write_nolock) from [<802c8a18>] > (ubifs_jnl_write_data+0x1c0/0x2bc) > [<802c8a18>] (ubifs_jnl_write_data) from [<802caed8>] > (do_writepage+0xa4/0x1b0) > [<802caed8>] (do_writepage) from [<801aa160>] (__writepage+0x14/0x48) > [<801aa160>] (__writepage) from [<801aa900>] (write_cache_pages+0x1d0/0x3e4) > [<801aa900>] (write_cache_pages) from [<801aab68>] > (generic_writepages+0x54/0x80) > [<801aab68>] (generic_writepages) from [<801ac9a0>] > (do_writepages+0x68/0x8c) > [<801ac9a0>] (do_writepages) from [<801a0ac8>] > (__filemap_fdatawrite_range+0x88/0xc0) > [<801a0ac8>] (__filemap_fdatawrite_range) from [<801a0cc4>] > (file_write_and_wait_range+0x3c/0x98) > [<801a0cc4>] (file_write_and_wait_range) from [<802cb600>] > (ubifs_fsync+0x3c/0xb0) > [<802cb600>] (ubifs_fsync) from [<801a2828>] > (generic_file_write_iter+0x198/0x24c) > [<801a2828>] (generic_file_write_iter) from [<802ccb84>] > (ubifs_write_iter+0xf0/0x158) > [<802ccb84>] (ubifs_write_iter) from [<801ef854>] (__vfs_write+0xfc/0x160) > [<801ef854>] (__vfs_write) from [<801efa60>] (vfs_write+0xa4/0x1ac) > [<801efa60>] (vfs_write) from [<801efcac>] (ksys_write+0x54/0xb8) > [<801efcac>] (ksys_write) from [<80101000>] (ret_fast_syscall+0x0/0x54) > Exception stack(0xbd789fa8 to 0xbd789ff0) > 9fa0: 0ca5d000 00000000 00000003 7e9f2900 00008000 > ffffffff > 9fc0: 0ca5d000 00000000 00008000 00000004 00000003 00000000 76f24fb4 > 00000000 > 9fe0: 00000000 7e9f27fc 00010fd8 76e775ec > marvell-nfc f10d0000.nand-controller: Timeout on RDDREQ while draining > FIFO (data) (NDSR: 0x00000810) > ttyS ttyS1: tty_port_close_start: tty->count = 1 port count = 2 > marvell-nfc f10d0000.nand-controller: Timeout on RDDREQ while draining > FIFO (data) (NDSR: 0x00000810) > marvell-nfc f10d0000.nand-controller: Timeout on RDDREQ while draining > FIFO (data) (NDSR: 0x00000810) > marvell-nfc f10d0000.nand-controller: Timeout on RDDREQ while draining > FIFO (data) (NDSR: 0x00000810) > marvell-nfc f10d0000.nand-controller: Timeout on RDDREQ while draining > FIFO (data) (NDSR: 0x00000810) > marvell-nfc f10d0000.nand-controller: Timeout on RDDREQ while draining > FIFO (data) (NDSR: 0x00000810) > marvell-nfc f10d0000.nand-controller: Timeout on RDDREQ while draining > FIFO (data) (NDSR: 0x00000810) > > ... (RDDREQ messages repeat). Hm, that's weird, unless RDDREQ is a 'clear-on-read' bit, that shouldn't happen.