Re: [PATCH v2 00/19] mtd: rawnand: cafe: Convert to exec_op() (and more)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 6 May 2020 00:01:52 +0200
Lubomir Rintel <lkundrak@xxxxx> wrote:

> On Tue, May 05, 2020 at 04:46:41PM +0200, Lubomir Rintel wrote:
> > On Tue, May 05, 2020 at 12:13:34PM +0200, Boris Brezillon wrote:  
> > > Hello,
> > > 
> > > A bit of context to explain the motivation behind those conversions
> > > I've been sending for the last couple of weeks. The raw NAND subsystem
> > > carries a lot of history which makes any rework not only painful, but
> > > also subject to regressions which we only detect when someone dares to
> > > update its kernel on one of those ancient HW. While carrying drivers
> > > for old HW is not a problem per se, carrying ancient and unmaintained
> > > drivers that are not converted to new APIs is a maintenance burden,
> > > hence this massive conversion attempt I'm conducting here.
> > > 
> > > So here is a series converting the CAFE NAND controller driver to
> > > exec_op(), plus a bunch of minor improvements done along the way.
> > > I hope I'll find someone to test those changes, but if there's no one
> > > still owning OLPC HW or no interest in keeping it supported in recent
> > > kernel versions, we should definitely consider removing the driver
> > > instead.
> > > 
> > > No major changes in this v2, apart from fixes for things reported by
> > > Lubomir and Miquel. See the changelog on each patch for more details.
> > > 
> > > Regards,
> > > 
> > > Boris
> > > 
> > > Boris Brezillon (19):
> > >   mtd: rawnand: Propage CS selection to sub operations
> > >   mtd: rawnand: cafe: Get rid of an inaccurate kernel doc header
> > >   mtd: rawnand: cafe: Rename cafe_nand_write_page_lowlevel()
> > >   mtd: rawnand: cafe: Use a correct ECC mode and pass the ECC alg
> > >   mtd: rawnand: cafe: Include linux/io.h instead of asm/io.h
> > >   mtd: rawnand: cafe: Demistify register fields
> > >   mtd: rawnand: cafe: Factor out the controller initialization logic
> > >   mtd: rawnand: cafe: Get rid of the debug module param
> > >   mtd: rawnand: cafe: Use devm_kzalloc and devm_request_irq()
> > >   mtd: rawnand: cafe: Get rid of a useless label
> > >   mtd: rawnand: cafe: Explicitly inherit from nand_controller
> > >   mtd: rawnand: cafe: Don't leave ECC enabled in the write path
> > >   mtd: rawnand: cafe: Don't split things when reading/writing a page
> > >   mtd: rawnand: cafe: Add exec_op() support
> > >   mtd: rawnand: cafe: Get rid of the legacy interface implementation
> > >   mtd: rawnand: cafe: Adjust the cafe_{read,write}_buf() prototypes
> > >   mtd: rawnand: cafe: s/uint{8,16,32}_t/u{8,16,32}/
> > >   mtd: rawnand: cafe: Drop the cafe_{readl,writel}() wrappers
> > >   mtd: rawnand: cafe: Get rid of the last printk()
> > > 
> > >  drivers/mtd/nand/raw/cafe_nand.c | 798 ++++++++++++++++---------------
> > >  drivers/mtd/nand/raw/nand_base.c |   3 +-
> > >  include/linux/mtd/rawnand.h      |   2 +
> > >  3 files changed, 410 insertions(+), 393 deletions(-)  
> > 
> > Just confirming that this set works a treat on an OLPC XO-1 laptop,
> > applied on top of v5.7-rc4.  
> 
> Perhaps I spoke too soon. Before I've tested by dumping a couple of pages
> and checksumming them and it indeed worked fine every time.
> 
> Now I've actually tried to mount a JFFS2 filesystem. On v5.7-rc4 it worked:
> 
>   bash-5.0# mount -t jffs2 mtd0 /mnt
>   jffs2: jffs2_scan_dirent_node(): Name CRC failed on node at 0x30212fc8: Read 0x583ccb57, calculated 0x06d03796
>   jffs2: notice: (96) jffs2_build_xattr_subsystem: complete building xattr subsystem, 0 of xdatum (0 unchecked, 0 orphan) and 0 of xref (0 dead, 0 orphan) f.
>   bash-5.0# ls /mnt
>   boot        dev         lost+found  security    versions
>   boot-alt    home        proc        sys
>   bash-5.0#
> 
> Whereas with the patches applied it does not:
> 
>   bash-5.0# mount -t jffs2 mtd0 /mnt
>   jffs2: cannot read OOB for EB at 00260000, requested 8 bytes, read 0 bytes, error -110
>   mount: /mnt: can't read superblock on mtd0.
>   bash-5.0# mount -t jffs2 mtd0 /mnt
>   jffs2: cannot read OOB for EB at 001c0000, requested 8 bytes, read 0 bytes, error -110
>   mount: /mnt: can't read superblock on mtd0.
>   bash-5.0# mount -t jffs2 mtd0 /mnt
>   jffs2: warning: (102) jffs2_sum_scan_sumnode: Summary node crc error, skipping summary information.
>   jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00122a28: 0x15ee instead
>   jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00122a2c: 0x4176 instead
>   jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00122a30: 0x105d instead
>   jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00122a34: 0x13f1 instead
>   jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00122a38: 0x7df1 instead
>   jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00122a3c: 0xca39 instead
>   jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00122a40: 0xe70e instead
>   jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00122a44: 0x1004 instead
>   jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00122a48: 0x2902 instead
>   jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00122a4c: 0x0030 instead
>   jffs2: Further such events for this erase block will not be printed
>   jffs2: jffs2_scan_dirent_node(): Name CRC failed on node at 0x001287cc: Read 0x7fea120a, calculated 0x9ceec452
>   jffs2: jffs2_scan_dirent_node(): Name CRC failed on node at 0x00128fcc: Read 0x7fea120a, calculated 0xb77b2a34
>   jffs2: warning: (102) jffs2_sum_scan_sumnode: Summary node crc error, skipping summary information.
>   jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x007232b0: 0xdd29 instead
>   jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x007232b4: 0xf37e instead
>   jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x007232b8: 0x251b instead
>   jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x007232bc: 0xe805 instead
>   jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x007232c0: 0xd3bf instead
>   jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x007232c4: 0x2927 instead
>   jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x007232c8: 0xd79d instead
>   jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x007232cc: 0xd548 instead
>   jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x007232d0: 0x46e3 instead
>   jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x007232d4: 0x7716 instead
>   jffs2: Further such events for this erase block will not be printed
>   jffs2: cannot read OOB for EB at 009e0000, requested 8 bytes, read 0 bytes, error -110
>   mount: /mnt: can't read superblock on mtd0.
>   bash-5.0# 
> 
> Note that the failed reads are for different addresses every time. The
> CRC errors are there presumably because we don't handle
> nand_read_page_op() errors in cafe_nand_read_page().
> 
> I have no idea what's going on. With the debug trace on it looks like
> CAFE_NAND_IRQ_FLASH_RDY is sometimes not flipped on for no apparent
> reason:
> 
>   ...
>   instr[0] = CMD(00)
>   instr[1] = ADDR(00 00 bd 09 00)
>   instr[2] = CMD(30)
>   instr[3] = WAITRDY
>   instr[4] = DATA_IN(len=2112)
>   5: ret=0 status=d0000000 wait=50000000
>   instr[0] = CMD(00)
>   instr[1] = ADDR(00 00 be 09 00)
>   instr[2] = CMD(30)
>   instr[3] = WAITRDY
>   instr[4] = DATA_IN(len=2112)
>   5: ret=0 status=d0000000 wait=50000000
>   instr[0] = CMD(00)
>   instr[1] = ADDR(00 08 c0 09 00)
>   instr[2] = CMD(30)
>   instr[3] = WAITRDY
>   instr[4] = DATA_IN(len=64)
>   5: ret=0 status=d0000000 wait=50000000
>   instr[0] = CMD(00)
>   instr[1] = ADDR(00 00 ff 09 00)
>   instr[2] = CMD(30)
>   instr[3] = WAITRDY
>   instr[4] = DATA_IN(len=2112)
>   5: ret=0 status=d0000000 wait=50000000
>   instr[0] = CMD(00)
>   instr[1] = ADDR(00 00 fd 09 00)
>   instr[2] = CMD(30)
>   instr[3] = WAITRDY
>   instr[4] = DATA_IN(len=2112)
>   5: ret=0 status=d0000000 wait=50000000
>   instr[0] = CMD(00)
>   instr[1] = ADDR(00 00 fe 09 00)
>   instr[2] = CMD(30)
>   instr[3] = WAITRDY
>   instr[4] = DATA_IN(len=2112)
>   5: ret=0 status=d0000000 wait=50000000
>   instr[0] = CMD(00)
>   instr[1] = ADDR(00 08 00 0a 00)
>   instr[2] = CMD(30)
>   instr[3] = WAITRDY
>   instr[4] = DATA_IN(len=64)
>   5: ret=-110 status=90000000 wait=50000000
>   jffs2: cannot read OOB for EB at 00500000, requested 8 bytes, read 0 bytes, error -110
> 
> Indeed, commenting out the CAFE_NAND_IRQ_FLASH_RDY makes things work

Looks like that happens when you read less than 2k. Can you print the
CAFE_NAND_STATUS register in your trace? I suspect
CAFE_NAND_STATUS_FLASH_BUSY will be cleared, so maybe it's the way to
go if we want to make sure the check Ready/Busy pin status.

> (on a somewhat unrelated note, the waitrdy doesn't seem to be used at
> all):

Yep, I initially thought I would need to poll the NAND_STATUS register
and added a boolean to keep track that before realizing there was a bit
in the IRQ_STATUS reg.

> 
>        case NAND_OP_WAITRDY_INSTR:
> 	       // wait |= CAFE_NAND_IRQ_FLASH_RDY;
> 	       waitrdy = true;
> 	       break;
>        }
> 
> 
>   bash-5.0# time mount -t jffs2 mtd0 /mnt
>   jffs2: jffs2_scan_dirent_node(): Name CRC failed on node at 0x30212fc8: Read 0x583ccb57, calculated 0x06d03796
>   jffs2: notice: (96) jffs2_build_xattr_subsystem: complete building xattr subsystem, 0 of xdatum (0 unchecked, 0 orphan) and 0 of xref (0 dead, 0 orphan) f.
>   
>   real    5m5.512s
>   user    0m0.000s
>   sys     0m17.523s
>   bash-5.0#
> 
> The mount time seems to be significantly higher than with mainline,
> where it looks more like this:

Can you add a trace in the IRQ handler (before the "if (!irqs)" check).
I interpreted the 'masked' state as 'irq disabled', but maybe that's
the opposite, and we're being flood by uncleared 'FLASH_RDY'
interrupts. In any case, given we're polling the IRQ_STATUS
register, I don't think we need to enable interrupts (unless the
status is only updated when the interrupt is enabled).

Using readl_poll_timeout_atomic() instead of readl_poll_timeout() might
also help.

> 
>   bash-5.0# time mount -t jffs2 mtd0 /mnt
>   jffs2: jffs2_scan_dirent_node(): Name CRC failed on node at 0x30212fc8: Read 0x583ccb57, calculated 0x06d03796
>   jffs2: notice: (96) jffs2_build_xattr_subsystem: complete building xattr subsystem, 0 of xdatum (0 unchecked, 0 orphan) and 0 of xref (0 dead, 0 orphan) f.
>   
>   real    0m7.282s
>   user    0m0.000s
>   sys     0m7.118s
>   bash-5.0# 
> 
> I have not looked into why that might be; perhaps you have an idea?
> 
> Lubo


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/



[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux