Re: [PATCH] mmc: sh-mmcif: avoid Oops on spurious interrupts

Tetsuyuki Kobayashi <koba@xxxxxxxxxxx> · Tue, 18 Sep 2012 17:02:18 +0900

Hello Guennadi

(09/18/2012 03:42 PM), Guennadi Liakhovetski wrote:
> Hello Kobayashi-san
> 
> On Tue, 18 Sep 2012, Tetsuyuki Kobayashi wrote:
> 
>> Hello Guennadi
>>
>> (2012/08/31 12:05), Tetsuyuki Kobayashi wrote:
>>
>>> (2012/08/22 15:49), Guennadi Liakhovetski wrote:
>>>> On some systems, e.g., kzm9g, MMCIF interfaces can produce spurious
>>>> interrupts without any active request. To prevent the Oops, that results
>>>> in such cases, don't dereference the mmc request pointer until we make
>>>> sure, that we are indeed processing such a request.
>>>>
>>>> Reported-by: Tetsuyuki Kobayashi <koba@xxxxxxxxxxx>
>>>> Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@xxxxxx>
>>>> ---
>>>>
>>>> Hello Kobayashi-san
>>>>
>>>> On Mon, 20 Aug 2012, Tetsuyuki Kobayashi wrote:
>>>>
>>>> ...
>>>>
>>>>> After applying this patch on kzm9g board, I got this error regarding
>>>>> eMMC.
>>>>> I think this is another problem.
>>>>>
>>>>>
>>>>> Unable to handle kernel NULL pointer dereference at virtual address
>>>>> 00000008
>>>>> pgd = c0004000
>>>>> [00000008] *pgd=00000000
>>>>> Internal error: Oops: 17 [#1] PREEMPT SMP ARM
>>>>> Modules linked in:
>>>>> CPU: 1    Not tainted  (3.6.0-rc2+ #103)
>>>>> PC is at sh_mmcif_irqt+0x20/0xb30
>>>>> LR is at irq_thread+0x94/0x16c
>>>>
>>>> [snip]
>>>>
>>>>> My quick fix is below.
>>>>>
>>>>> diff --git a/drivers/mmc/host/sh_mmcif.c b/drivers/mmc/host/sh_mmcif.c
>>>>> index 5d81427..e587fbc 100644
>>>>> --- a/drivers/mmc/host/sh_mmcif.c
>>>>> +++ b/drivers/mmc/host/sh_mmcif.c
>>>>> @@ -1104,7 +1104,15 @@ static irqreturn_t sh_mmcif_irqt(int irq, void
>>>>> *dev_id)
>>>>>   {
>>>>>          struct sh_mmcif_host *host = dev_id;
>>>>>          struct mmc_request *mrq = host->mrq;
>>>>> -       struct mmc_data *data = mrq->data;
>>>>> +       /*struct mmc_data *data = mrq->data; -- this cause null
>>>>> pointer access*/
>>>>> +       struct mmc_data *data;
>>>>> +
>>>>> +       /* quick fix by koba */
>>>>> +       if (mrq == NULL) {
>>>>> +               printk("sh_mmcif_irqt: mrq == NULL:
>>>>> host->wait_for=%d\n", host->wait_for);
>>>>> +       } else {
>>>>> +               data = mrq->data;
>>>>> +       }
>>>>>
>>>>>          cancel_delayed_work_sync(&host->timeout_work);
>>>>>
>>>>>
>>>>> With this patch, there is no null pointer accesses and got this log.
>>>>>
>>>>> sh_mmcif_irqt: mrq == NULL: host->wait_for=0
>>>>> sh_mmcif_irqt: mrq == NULL: host->wait_for=0
>>>>>    ...
>>>>>
>>>>> host->wait_for is 0. it is MMCIF_WAIT_FOR_REQUEST.
>>>>> There is code such like:
>>>>>
>>>>>         host->wait_for = MMCIF_WAIT_FOR_REQUEST;
>>>>>         host->mrq = NULL;
>>>>>
>>>>> So, at the top of sh_mmcif_irqt, if host->wait_for ==
>>>>> MMCIF_WAIT_FOR_REQUEST,
>>>>> host->mrq = NULL.
>>>>> It is too earlier to access mrq->data before checking host->mrq. it may
>>>>> cause null pointer access.
>>>>>
>>>>> Goda-san, could you check this and refine the code of sh_mmcif_irqt?
>>>>
>>>> Thanks for your report and a fix. Could you please double-check, whether
>>>> the below patch also fixes your problem? Since such spurious interrupts
>>>> are possible I would commit a check like this one, but in the longer run
>>>> we want to identify and eliminate them, if possible. But since so far
>>>> these interrupts only happen on 1 board model and also not on all units
>>>> and not upon each boot, this could be a bit tricky.
>>>>
>>>> One more question - is this only needed for 3.7 or also for 3.6 / stable?
>>>>
>>>> Thanks
>>>> Guennadi
>>>>
>>>>   drivers/mmc/host/sh_mmcif.c |    4 ++--
>>>>   1 files changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/mmc/host/sh_mmcif.c b/drivers/mmc/host/sh_mmcif.c
>>>> index 5d81427..82bf921 100644
>>>> --- a/drivers/mmc/host/sh_mmcif.c
>>>> +++ b/drivers/mmc/host/sh_mmcif.c
>>>> @@ -1104,7 +1104,6 @@ static irqreturn_t sh_mmcif_irqt(int irq, void
>>>> *dev_id)
>>>>   {
>>>>       struct sh_mmcif_host *host = dev_id;
>>>>       struct mmc_request *mrq = host->mrq;
>>>> -    struct mmc_data *data = mrq->data;
>>>>
>>>>       cancel_delayed_work_sync(&host->timeout_work);
>>>>
>>>> @@ -1152,13 +1151,14 @@ static irqreturn_t sh_mmcif_irqt(int irq, void
>>>> *dev_id)
>>>>       case MMCIF_WAIT_FOR_READ_END:
>>>>       case MMCIF_WAIT_FOR_WRITE_END:
>>>>           if (host->sd_error)
>>>> -            data->error = sh_mmcif_error_manage(host);
>>>> +            mrq->data->error = sh_mmcif_error_manage(host);
>>>>           break;
>>>>       default:
>>>>           BUG();
>>>>       }
>>>>
>>>>       if (host->wait_for != MMCIF_WAIT_FOR_STOP) {
>>>> +        struct mmc_data *data = mrq->data;
>>>>           if (!mrq->cmd->error && data && !data->error)
>>>>               data->bytes_xfered =
>>>>                   data->blocks * data->blksz;
>>>>
>>>
>>> I tried this patch. It seems better.
>>> But I think this still have potential race condition.
>>> I am afraid that one cpu enter sh_mmcif_irqt and other cpu write to
>>> host->wait_for for new request at the same time.
>>> How about add this code at the top of sh_mmcif_irqt or before returning
>>> IRQ_WAKE_THREAD in sh_mmcif_intr ?
>>>
>>>      if (host->state == STATE_IDLE)
>>>          return IRQ_HANDLED;
>>>
>>> I will rebase my test environment to v3.6-rc3 or later. Then I will
>>> send you my .config.
>>>
>> How is this?
>> I hope this fixed in v3.6.
> 
> Sorry, I still haven't come round to looking at this. I think, one thing 
> could halp, if you could try to find out what exactly those spurious 
> interrupts look like, i.e., what's their interrupt status? You could try 
> to print like
> 
> diff -u a/drivers/mmc/host/sh_mmcif.c b/drivers/mmc/host/sh_mmcif.c
> --- a/drivers/mmc/host/sh_mmcif.c
> +++ b/drivers/mmc/host/sh_mmcif.c
> @@ -1229,6 +1229,10 @@
>  		host->sd_error = true;
>  		dev_dbg(&host->pd->dev, "int err state = %08x\n", state);
>  	}
> +	if (host->state == STATE_IDLE) {
> +		dev_info(&host->pd->dev, "Spurious IRQ status 0x%x", state);
> +		return IRQ_HANDLED;
> +	}
>  	if (state & ~(INT_CMD12RBE | INT_CMD12CRE)) {
>  		if (!host->dma_active)
>  			return IRQ_WAKE_THREAD;
> 
> Please, let me know, if it's not very easy for you ATM to perform the 
> test, I'll try to do it myself then.

OK. It is easy for me.
I got this log after mounting /dev/mmcblk2p1 (on eMMC) and executing
tar command to make file accesses.
This is based on v3.6-rc6.

[  149.968750] EXT4-fs (mmcblk2p1): warning: maximal mount count reached, running e2fsck is recommended
[  150.296875] EXT4-fs (mmcblk2p1): mounted filesystem with ordered data mode. Opts: (null)
[  221.539062] sh_mmcif sh_mmcif.0: Spurious IRQ status 0x7400000
[  221.585937] sh_mmcif sh_mmcif.0: Spurious IRQ status 0x7400000
[  222.039062] sh_mmcif sh_mmcif.0: Spurious IRQ status 0x7400000
[  222.226562] sh_mmcif sh_mmcif.0: Spurious IRQ status 0x7400000
[  222.382812] sh_mmcif sh_mmcif.0: Spurious IRQ status 0x7400000
[  223.109375] sh_mmcif sh_mmcif.0: Spurious IRQ status 0x7400000
[  223.406250] sh_mmcif sh_mmcif.0: Spurious IRQ status 0x7400000
[  223.734375] sh_mmcif sh_mmcif.0: Spurious IRQ status 0x7400000
[  223.750000] sh_mmcif sh_mmcif.0: Spurious IRQ status 0x7400000
[  224.398437] sh_mmcif sh_mmcif.0: Spurious IRQ status 0x7400000
[  230.289062] sh_mmcif sh_mmcif.0: Spurious IRQ status 0x7400000

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html