Re: [PATCH v2 00/19] mtd: rawnand: cafe: Convert to exec_op() (and more)

Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> · Sat, 9 May 2020 22:01:02 +0200

On Sat, 9 May 2020 21:34:40 +0200
Lubomir Rintel <lkundrak@xxxxx> wrote:

> On Thu, May 07, 2020 at 10:12:57PM +0200, Boris Brezillon wrote:
> > On Thu, 7 May 2020 15:47:08 +0200
> > Lubomir Rintel <lkundrak@xxxxx> wrote:
> >   
> > > On Wed, May 06, 2020 at 11:35:52PM +0200, Boris Brezillon wrote:  
> > > > On Wed, 6 May 2020 22:36:35 +0200
> > > > Lubomir Rintel <lkundrak@xxxxx> wrote:
> > > >     
> > > > > > We really should mask IRQs (AKA disable IRQs in my naming convention
> > > > > > :-)) here, unless we want to switch to interrupt-based waits (which
> > > > > > would be a good thing when we have DMA or WAIT_RDY involved). Having an
> > > > > > interrupt handler in the current implementation doesn't make any sense
> > > > > > (that's assuming the IRQ_STATUS bits are updated even if the interrupts
> > > > > > are disabled, which am not sure is a valid assumption in this case).      
> > > > > 
> > > > > I have no idea why the interrupt handler is there. Perhaps some
> > > > > interrupts can't be masked and need an ack or something.    
> > > > 
> > > > Can you try to set NAND_IRQ_MASK to 0x0 and see if that still works.
> > > > Can you also check the number of NAND interrupts when set to 0x0? It's
> > > > hard to tell exactly what caused the interrupt handler to be called
> > > > since this is a shared interrupt.    
> > > 
> > > When it's set to 0, I get an interrupt with CAFE_NAND_IRQ=0x40000000
> > > (CAFE_NAND_IRQ_FLASH_RDY) right off the bat. That doesn't happen with
> > > a mask of 0xffffffff.
> > > 
> > > When changing the handler to always ack CAFE_NAND_IRQ_FLASH_RDY I've
> > > also seen CAFE_NAND_IRQ=0x80000000 (CAFE_NAND_IRQ_CMD_DONE) suggesting
> > > that other interrupts aren't masked either.
> > > 
> > > It seems to be that ones indeed mask interrupts but just can't be
> > > masked (CAFE_NAND_IRQ_CMD_DONE or CAFE_NAND_IRQ_DMA_DONE), perhaps
> > > due to hardware bugs.
> > >   
> > 
> > I pushed a new version with some interrupt-related changes [1].
> > 
> > [1]https://github.com/bbrezillon/linux/commits/nand/cafe-nand-exec-op-debug  
> 
> Works with one fix:
> 
> diff --git a/drivers/mtd/nand/raw/cafe_nand.c b/drivers/mtd/nand/raw/cafe_nand.c
> index 591d79730961..e37737b7b089 100644
> --- a/drivers/mtd/nand/raw/cafe_nand.c
> +++ b/drivers/mtd/nand/raw/cafe_nand.c
> @@ -801,6 +801,7 @@ static int cafe_nand_probe(struct pci_dev *pdev,
>         if (!cafe)
>                 return  -ENOMEM;
>  
> +       init_completion(&cafe->complete);

Oops, indeed.

>         mtd = nand_to_mtd(&cafe->nand);
>         mtd->dev.parent = &pdev->dev;
>         nand_set_controller_data(&cafe->nand, cafe);
> 
> However, the mount JFFS2 mount takes about twice as long as it did with
> the polling version:

Yes, that's not surprising. At the same time, using atomic-polling for
something that's expected to take hundreds of microseconds is not that
great. That means your CPU is not doing anything useful while you wait
for the read/write/erase operation to finish.

> 
>   bash-5.0# time mount -t jffs2 mtd0 /mnt
>   jffs2: jffs2_scan_dirent_node(): Name CRC failed on node at 0x30212fc8: Read 0x583ccb57, calculated 0x06d03796
>   jffs2: notice: (96) jffs2_build_xattr_subsystem: complete building xattr subsystem, 0 of xdatum (0 unchecked, 0 orphan) and 0 of xref (0 dead, 0 orphan) f.
>   
>   real    0m15.374s

Given the time it takes to mount the FS, I'd recommend considering
switching to UBI/UBIFS, but maybe that's not an option here.

>   user    0m0.000s
>   sys     0m9.727s
> 
> Lubo

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/