Re: [PATCH v2] mtd: rawnand: marvell: check for RDY bits after enabling the IRQ

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 28/09/18 20:29, Daniel Mack wrote:
> On 28/9/2018 10:24 AM, Miquel Raynal wrote:
>> Hi Daniel,
>>
>> Daniel Mack <daniel@xxxxxxxxxx> wrote on Fri, 28 Sep 2018 09:43:18
>> +0200:
>>
>>> Hi Chris,
>>>
>>> On 27/9/2018 11:55 PM, Chris Packham wrote:
>>>> On 27/09/18 20:56, Boris Brezillon wrote:
>>>>> On Thu, 27 Sep 2018 10:11:45 +0200
>>>>> Miquel Raynal <miquel.raynal@xxxxxxxxxxx> wrote:
>>>>>    
>>>>>> Hi Daniel,
>>>>>>
>>>>>> Daniel Mack <daniel@xxxxxxxxxx> wrote on Thu, 27 Sep 2018 09:17:51
>>>>>> +0200:
>>>>>>    
>>>>>>> At least on PXA3xx platforms, enabling RDY interrupts in the NDCR register
>>>>>>> will only cause the IRQ to latch when the RDY lanes are changing, and not
>>>>>>> in case they are already asserted.
>>>>>>>
>>>>>>> This means that if the controller finished the command in flight before
>>>>>>> marvell_nfc_wait_op() is called, that function will wait for a change in
>>>>>>> the bit that can't ever happen as it is already set.
>>>>>>>
>>>>>>> To address this race, check for the RDY bits after the IRQ was enabled,
>>>>>>> and complete the completion immediately if the condition is already met.
>>>>>>>
>>>>>>> This fixes a bug that was observed with a NAND chip that holds a UBIFS
>>>>>>> parition on which file system stress tests were executed. When
>>>>>>> marvell_nfc_wait_op() reports an error, UBI/UBIFS will eventually mount
>>>>>>> the filesystem read-only, reporting lots of warnings along the way.
>>>>>>>
>>>>>>> Fixes: 02f26ecf8c77 mtd: nand: add reworked Marvell NAND controller driver
>>>>>>> Cc: stable@xxxxxxxxxxxxxxx
>>>>>>> Signed-off-by: Daniel Mack <daniel@xxxxxxxxxx>
>>>>>>> ---
>>>>>>
>>>>>> Sorry I haven't had the time to check on my Armada, but you figured it
>>>>>> out, and the fix looks good to me!
>>>>>>
>>>>>> Acked-by: Miquel Raynal <miquel.raynal@xxxxxxxxxxx>
>>>>>>
>>>>>> Boris, do you plan to send another fixes PR of can I take it into
>>>>>> the nand/next branch?
>>>>>
>>>>> Queued to mtd/master.
>>>>> After fixing my R/B configuration I get a new error with this patch when
>>>> running stress_1 from mtd-utils-2.0.0. I don't see this without the patch.
>>>
>>> That's strange. So your controller sets the RDY bits before it is ready? Could
>>> you check whether only checking for NDSR_RDY(0) changes anything? Not sure
>>> about the handling of NDSR_RDY(1) in this driver anyway ...
>>>
>>
>> I suppose you mean this portion of code is not clear enough?
>>
>>
>>           u32 st = readl_relaxed(nfc->regs + NDSR);
>>           u32 ien = (~readl_relaxed(nfc->regs + NDCR)) & NDCR_ALL_INT;
>>
>>           /*
>>            * RDY interrupt mask is one bit in NDCR while there are two status
>>            * bit in NDSR (RDY[cs0/cs2] and RDY[cs1/cs3]).
>>            */
>>           if (st & NDSR_RDY(1))
>>                   st |= NDSR_RDY(0);
>>
>>           if (!(st & ien))
>>                   return IRQ_NONE;
>>
>>
>> -> st is the status in the NDSR register which has two RDY bits, one
>>      for each RDY line.
>> -> ien is a view of the NDCR register which commands the interrupts
>>      and has one bit to enable both interrupt lines (let's call it
>>      NDCR_RDYM).
>>
>> The trick is that NDSR_RDY(0) is the same bit as NDCR_RDYM.
>> So whenever NDSR_RDY(1) is set, we fake NDSR_RDY(0) to be set (by
>> setting manually the bit in 'st') so that the (st & ien) comparison
>> can be true if NDSR_RDY(1) is valid and RDY interrupts are enabled.
>>
>> With this in mind, I don't see why this
>>
>> +	st = readl_relaxed(nfc->regs + NDSR);
>> +	if (st & (NDSR_RDY(0) | NDSR_RDY(1)))
>> +		complete(&nfc->complete);
> 
> Yeah, me neither. Chris, are you absolutely sure this is the reason? I'm
> asking because it took me several tries sometimes to trigger the bug, so
> is there a chance that you see an error at all times, regardless of
> whether my patch is applied?

It seems pretty consistent. Without this patch there seems to be no 
problem. With this patch it triggers pretty much straight away. I can't 
discount that there might be something wrong with my dts (the R/B 
configuration was missing initially).

I've also been able to run this on the DB-88F6820-AMC board with the 
same result (the dts for this is in the for-next branch of 
git://git.infradead.org/linux-mvebu.git).

The really odd thing is the following seems to avoid the problem

+        st = readl_relaxed(nfc->regs + NDSR);
+        udelay(1000);
+        if (st & (NDSR_RDY(0) | NDSR_RDY(1)))
+                complete(&nfc->complete);

Which is weird because the st value has already been read so the udelay 
should have no effect.

On 28/09/18 19:43, Daniel Mack wrote:
 >
 > Also, does my .EALREADY approach (v1) make any difference?
 >

The v1 of this patch doesn't show the problem.





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux