Re: [PATCH v2 00/19] mtd: rawnand: cafe: Convert to exec_op() (and more)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 10 May 2020 09:21:08 +0200
Lubomir Rintel <lkundrak@xxxxx> wrote:

> On Sun, May 10, 2020 at 08:45:41AM +0200, Boris Brezillon wrote:
> > On Sun, 10 May 2020 08:31:05 +0200
> > Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> wrote:
> >   
> > > On Sat, 9 May 2020 22:28:55 +0200
> > > Lubomir Rintel <lkundrak@xxxxx> wrote:
> > >   
> > > > On Sat, May 09, 2020 at 10:01:02PM +0200, Boris Brezillon wrote:    
> > > > > On Sat, 9 May 2020 21:34:40 +0200
> > > > > Lubomir Rintel <lkundrak@xxxxx> wrote:
> > > > >       
> > > > > > On Thu, May 07, 2020 at 10:12:57PM +0200, Boris Brezillon wrote:      
> > > > > > > On Thu, 7 May 2020 15:47:08 +0200
> > > > > > > Lubomir Rintel <lkundrak@xxxxx> wrote:
> > > > > > >         
> > > > > > > > On Wed, May 06, 2020 at 11:35:52PM +0200, Boris Brezillon wrote:        
> > > > > > > > > On Wed, 6 May 2020 22:36:35 +0200
> > > > > > > > > Lubomir Rintel <lkundrak@xxxxx> wrote:
> > > > > > > > >           
> > > > > > > > > > > We really should mask IRQs (AKA disable IRQs in my naming convention
> > > > > > > > > > > :-)) here, unless we want to switch to interrupt-based waits (which
> > > > > > > > > > > would be a good thing when we have DMA or WAIT_RDY involved). Having an
> > > > > > > > > > > interrupt handler in the current implementation doesn't make any sense
> > > > > > > > > > > (that's assuming the IRQ_STATUS bits are updated even if the interrupts
> > > > > > > > > > > are disabled, which am not sure is a valid assumption in this case).            
> > > > > > > > > > 
> > > > > > > > > > I have no idea why the interrupt handler is there. Perhaps some
> > > > > > > > > > interrupts can't be masked and need an ack or something.          
> > > > > > > > > 
> > > > > > > > > Can you try to set NAND_IRQ_MASK to 0x0 and see if that still works.
> > > > > > > > > Can you also check the number of NAND interrupts when set to 0x0? It's
> > > > > > > > > hard to tell exactly what caused the interrupt handler to be called
> > > > > > > > > since this is a shared interrupt.          
> > > > > > > > 
> > > > > > > > When it's set to 0, I get an interrupt with CAFE_NAND_IRQ=0x40000000
> > > > > > > > (CAFE_NAND_IRQ_FLASH_RDY) right off the bat. That doesn't happen with
> > > > > > > > a mask of 0xffffffff.
> > > > > > > > 
> > > > > > > > When changing the handler to always ack CAFE_NAND_IRQ_FLASH_RDY I've
> > > > > > > > also seen CAFE_NAND_IRQ=0x80000000 (CAFE_NAND_IRQ_CMD_DONE) suggesting
> > > > > > > > that other interrupts aren't masked either.
> > > > > > > > 
> > > > > > > > It seems to be that ones indeed mask interrupts but just can't be
> > > > > > > > masked (CAFE_NAND_IRQ_CMD_DONE or CAFE_NAND_IRQ_DMA_DONE), perhaps
> > > > > > > > due to hardware bugs.
> > > > > > > >         
> > > > > > > 
> > > > > > > I pushed a new version with some interrupt-related changes [1].
> > > > > > > 
> > > > > > > [1]https://github.com/bbrezillon/linux/commits/nand/cafe-nand-exec-op-debug        
> > > > > > 
> > > > > > Works with one fix:
> > > > > > 
> > > > > > diff --git a/drivers/mtd/nand/raw/cafe_nand.c b/drivers/mtd/nand/raw/cafe_nand.c
> > > > > > index 591d79730961..e37737b7b089 100644
> > > > > > --- a/drivers/mtd/nand/raw/cafe_nand.c
> > > > > > +++ b/drivers/mtd/nand/raw/cafe_nand.c
> > > > > > @@ -801,6 +801,7 @@ static int cafe_nand_probe(struct pci_dev *pdev,
> > > > > >         if (!cafe)
> > > > > >                 return  -ENOMEM;
> > > > > >  
> > > > > > +       init_completion(&cafe->complete);      
> > > > > 
> > > > > Oops, indeed.
> > > > >       
> > > > > >         mtd = nand_to_mtd(&cafe->nand);
> > > > > >         mtd->dev.parent = &pdev->dev;
> > > > > >         nand_set_controller_data(&cafe->nand, cafe);
> > > > > > 
> > > > > > However, the mount JFFS2 mount takes about twice as long as it did with
> > > > > > the polling version:      
> > > > > 
> > > > > Yes, that's not surprising. At the same time, using atomic-polling for
> > > > > something that's expected to take hundreds of microseconds is not that
> > > > > great. That means your CPU is not doing anything useful while you wait
> > > > > for the read/write/erase operation to finish.      
> > > > 
> > > > Yes. But this really is too much of a slowdown:
> > > > 
> > > >   bash-5.0# time dd count=65536 bs=2k if=/dev/mtd0 of=/dev/null
> > > >   65536+0 records in
> > > >   65536+0 records out
> > > >   
> > > >   real    0m20.191s
> > > >   user    0m0.346s
> > > >   sys     0m10.366s
> > > > 
> > > > vs (previously):
> > > >   
> > > >   bash-5.0# time dd count=65536 bs=2k if=/dev/mtd0 of=/dev/null
> > > >   65536+0 records in
> > > >   65536+0 records out
> > > >   
> > > >   real    0m7.629s
> > > >   user    0m0.010s
> > > >   sys     0m7.500s
> > > >   bash-5.0#    
> > > 
> > > Almost a factor 3. I was definitely not expecting interrupt-based waits
> > > to have such a huge impact on the perfs.
> > >   
> > > > 
> > > > Note that your CPU can't be doing anything useful before the program and
> > > > its data is loaded from the storage :)    
> > > 
> > > Well, that's only true at mount time (and if you delay the mount after
> > > the boot, your CPU might already have other things to do), but any
> > > erase/write operations are likely to monopolize your CPU for no good
> > > reason.
> > >   
> > > > 
> > > > I suppose that if someone really prefers to avoid hogging the CPU at
> > > > this cost, then it makes sense to add a knob (a module parameter or
> > > > something) that would enable the interrupt-driven operation, but
> > > > keep polling as a default.    
> > > 
> > > Let's not add more module params than we already have, it just
> > > confuses users and deciding how to wait on HW events doesn't sounds
> > > like something they should be able to choose anyway (just like passing
> > > the timing params, this should be calculated by the driver). Oh well,
> > > I'll drop the patch adding interrupt-based waits. Having the driver
> > > converted to exec_op() is more than enough :-).  
> > 
> > Just pushed a new version. If it works for you I'll send a v3.  
> 
> Thank you. That's b6b10b45dd9 in nand/cafe-nand-exec-op-debug of
> https://github.com/bbrezillon/linux/ I suppose?
> 
> Without the readl_poll_timeout() -> readl_poll_timeout_atomic() change
> it's still very slow.

Should be fixed now.

> 
> Also, commit f89355b6b6 ("mtd: rawnand: cafe: Return IRQ_HANDLED when
> appropriate") looks somewhat suspicious to me. Previously it wrote the
> pending interrupt bits back into CAFE_NAND_IRQ, now you're masking them
> out in CAFE_NAND_IRQ_MASK (which already should be 0xffffffff) at this
> point. Why?

If interrupts are masked we don't need to clear them. We only clear
them before executing an operation to start from a fresh state.

> I thought the write back to CAFE_NAND_IRQ serves to ack the
> interrupts that came up but we don't handle elsewhere because we weren't
> expecting them.

If we reach the handler and all our irqs are masked, that means the irq
was not for us, which is possible since the irq line is shared. We
really should to return IRQ_NONE in that case, and clearing pending
interrupts is useless, since they are masked anyway. Since we read
the interrupt status from exec_op(), I thought it'd be better to never
clear any interrupt bits instead of clearing all bits but the CMD_DONE,
DMA_DONE and FLASH_RDY.

> 
> As you correctly pointed out; the source of the interrupts I'm seeing
> could be something else than the CAFE chip -- the camera or the MMC
> card. I'm not sure though; camera is certainly off and there shouldn't
> be much going on about the MMC card. I'm testing with a init=/bin/bash
> installation off a SD-card currently. I guess I can try switching to the
> USB flash stick and disable the camera and MMC altogether.

Okay, if that happens that would be a HW bug (or an interrupt coming
from somewhere else, maybe PCI errors?)? Can you print the values of
CAFE_GLOBAL_IRQ and CAFE_GLOBAL_IRQ_MASK in your irq handler?

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/



[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux