Re: [PATCH v2 00/19] mtd: rawnand: cafe: Convert to exec_op() (and more)

Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> · Sun, 10 May 2020 08:45:41 +0200

On Sun, 10 May 2020 08:31:05 +0200
Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> wrote:

> On Sat, 9 May 2020 22:28:55 +0200
> Lubomir Rintel <lkundrak@xxxxx> wrote:
> 
> > On Sat, May 09, 2020 at 10:01:02PM +0200, Boris Brezillon wrote:  
> > > On Sat, 9 May 2020 21:34:40 +0200
> > > Lubomir Rintel <lkundrak@xxxxx> wrote:
> > >     
> > > > On Thu, May 07, 2020 at 10:12:57PM +0200, Boris Brezillon wrote:    
> > > > > On Thu, 7 May 2020 15:47:08 +0200
> > > > > Lubomir Rintel <lkundrak@xxxxx> wrote:
> > > > >       
> > > > > > On Wed, May 06, 2020 at 11:35:52PM +0200, Boris Brezillon wrote:      
> > > > > > > On Wed, 6 May 2020 22:36:35 +0200
> > > > > > > Lubomir Rintel <lkundrak@xxxxx> wrote:
> > > > > > >         
> > > > > > > > > We really should mask IRQs (AKA disable IRQs in my naming convention
> > > > > > > > > :-)) here, unless we want to switch to interrupt-based waits (which
> > > > > > > > > would be a good thing when we have DMA or WAIT_RDY involved). Having an
> > > > > > > > > interrupt handler in the current implementation doesn't make any sense
> > > > > > > > > (that's assuming the IRQ_STATUS bits are updated even if the interrupts
> > > > > > > > > are disabled, which am not sure is a valid assumption in this case).          
> > > > > > > > 
> > > > > > > > I have no idea why the interrupt handler is there. Perhaps some
> > > > > > > > interrupts can't be masked and need an ack or something.        
> > > > > > > 
> > > > > > > Can you try to set NAND_IRQ_MASK to 0x0 and see if that still works.
> > > > > > > Can you also check the number of NAND interrupts when set to 0x0? It's
> > > > > > > hard to tell exactly what caused the interrupt handler to be called
> > > > > > > since this is a shared interrupt.        
> > > > > > 
> > > > > > When it's set to 0, I get an interrupt with CAFE_NAND_IRQ=0x40000000
> > > > > > (CAFE_NAND_IRQ_FLASH_RDY) right off the bat. That doesn't happen with
> > > > > > a mask of 0xffffffff.
> > > > > > 
> > > > > > When changing the handler to always ack CAFE_NAND_IRQ_FLASH_RDY I've
> > > > > > also seen CAFE_NAND_IRQ=0x80000000 (CAFE_NAND_IRQ_CMD_DONE) suggesting
> > > > > > that other interrupts aren't masked either.
> > > > > > 
> > > > > > It seems to be that ones indeed mask interrupts but just can't be
> > > > > > masked (CAFE_NAND_IRQ_CMD_DONE or CAFE_NAND_IRQ_DMA_DONE), perhaps
> > > > > > due to hardware bugs.
> > > > > >       
> > > > > 
> > > > > I pushed a new version with some interrupt-related changes [1].
> > > > > 
> > > > > [1]https://github.com/bbrezillon/linux/commits/nand/cafe-nand-exec-op-debug      
> > > > 
> > > > Works with one fix:
> > > > 
> > > > diff --git a/drivers/mtd/nand/raw/cafe_nand.c b/drivers/mtd/nand/raw/cafe_nand.c
> > > > index 591d79730961..e37737b7b089 100644
> > > > --- a/drivers/mtd/nand/raw/cafe_nand.c
> > > > +++ b/drivers/mtd/nand/raw/cafe_nand.c
> > > > @@ -801,6 +801,7 @@ static int cafe_nand_probe(struct pci_dev *pdev,
> > > >         if (!cafe)
> > > >                 return  -ENOMEM;
> > > >  
> > > > +       init_completion(&cafe->complete);    
> > > 
> > > Oops, indeed.
> > >     
> > > >         mtd = nand_to_mtd(&cafe->nand);
> > > >         mtd->dev.parent = &pdev->dev;
> > > >         nand_set_controller_data(&cafe->nand, cafe);
> > > > 
> > > > However, the mount JFFS2 mount takes about twice as long as it did with
> > > > the polling version:    
> > > 
> > > Yes, that's not surprising. At the same time, using atomic-polling for
> > > something that's expected to take hundreds of microseconds is not that
> > > great. That means your CPU is not doing anything useful while you wait
> > > for the read/write/erase operation to finish.    
> > 
> > Yes. But this really is too much of a slowdown:
> > 
> >   bash-5.0# time dd count=65536 bs=2k if=/dev/mtd0 of=/dev/null
> >   65536+0 records in
> >   65536+0 records out
> >   
> >   real    0m20.191s
> >   user    0m0.346s
> >   sys     0m10.366s
> > 
> > vs (previously):
> >   
> >   bash-5.0# time dd count=65536 bs=2k if=/dev/mtd0 of=/dev/null
> >   65536+0 records in
> >   65536+0 records out
> >   
> >   real    0m7.629s
> >   user    0m0.010s
> >   sys     0m7.500s
> >   bash-5.0#  
> 
> Almost a factor 3. I was definitely not expecting interrupt-based waits
> to have such a huge impact on the perfs.
> 
> > 
> > Note that your CPU can't be doing anything useful before the program and
> > its data is loaded from the storage :)  
> 
> Well, that's only true at mount time (and if you delay the mount after
> the boot, your CPU might already have other things to do), but any
> erase/write operations are likely to monopolize your CPU for no good
> reason.
> 
> > 
> > I suppose that if someone really prefers to avoid hogging the CPU at
> > this cost, then it makes sense to add a knob (a module parameter or
> > something) that would enable the interrupt-driven operation, but
> > keep polling as a default.  
> 
> Let's not add more module params than we already have, it just
> confuses users and deciding how to wait on HW events doesn't sounds
> like something they should be able to choose anyway (just like passing
> the timing params, this should be calculated by the driver). Oh well,
> I'll drop the patch adding interrupt-based waits. Having the driver
> converted to exec_op() is more than enough :-).

Just pushed a new version. If it works for you I'll send a v3.

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/