Hello Kobayashi-san On Tue, 18 Sep 2012, Tetsuyuki Kobayashi wrote: > Hello Guennadi > > (2012/08/31 12:05), Tetsuyuki Kobayashi wrote: > > > (2012/08/22 15:49), Guennadi Liakhovetski wrote: > > > On some systems, e.g., kzm9g, MMCIF interfaces can produce spurious > > > interrupts without any active request. To prevent the Oops, that results > > > in such cases, don't dereference the mmc request pointer until we make > > > sure, that we are indeed processing such a request. > > > > > > Reported-by: Tetsuyuki Kobayashi <koba@xxxxxxxxxxx> > > > Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@xxxxxx> > > > --- > > > > > > Hello Kobayashi-san > > > > > > On Mon, 20 Aug 2012, Tetsuyuki Kobayashi wrote: > > > > > > ... > > > > > > > After applying this patch on kzm9g board, I got this error regarding > > > > eMMC. > > > > I think this is another problem. > > > > > > > > > > > > Unable to handle kernel NULL pointer dereference at virtual address > > > > 00000008 > > > > pgd = c0004000 > > > > [00000008] *pgd=00000000 > > > > Internal error: Oops: 17 [#1] PREEMPT SMP ARM > > > > Modules linked in: > > > > CPU: 1 Not tainted (3.6.0-rc2+ #103) > > > > PC is at sh_mmcif_irqt+0x20/0xb30 > > > > LR is at irq_thread+0x94/0x16c > > > > > > [snip] > > > > > > > My quick fix is below. > > > > > > > > diff --git a/drivers/mmc/host/sh_mmcif.c b/drivers/mmc/host/sh_mmcif.c > > > > index 5d81427..e587fbc 100644 > > > > --- a/drivers/mmc/host/sh_mmcif.c > > > > +++ b/drivers/mmc/host/sh_mmcif.c > > > > @@ -1104,7 +1104,15 @@ static irqreturn_t sh_mmcif_irqt(int irq, void > > > > *dev_id) > > > > { > > > > struct sh_mmcif_host *host = dev_id; > > > > struct mmc_request *mrq = host->mrq; > > > > - struct mmc_data *data = mrq->data; > > > > + /*struct mmc_data *data = mrq->data; -- this cause null > > > > pointer access*/ > > > > + struct mmc_data *data; > > > > + > > > > + /* quick fix by koba */ > > > > + if (mrq == NULL) { > > > > + printk("sh_mmcif_irqt: mrq == NULL: > > > > host->wait_for=%d\n", host->wait_for); > > > > + } else { > > > > + data = mrq->data; > > > > + } > > > > > > > > cancel_delayed_work_sync(&host->timeout_work); > > > > > > > > > > > > With this patch, there is no null pointer accesses and got this log. > > > > > > > > sh_mmcif_irqt: mrq == NULL: host->wait_for=0 > > > > sh_mmcif_irqt: mrq == NULL: host->wait_for=0 > > > > ... > > > > > > > > host->wait_for is 0. it is MMCIF_WAIT_FOR_REQUEST. > > > > There is code such like: > > > > > > > > host->wait_for = MMCIF_WAIT_FOR_REQUEST; > > > > host->mrq = NULL; > > > > > > > > So, at the top of sh_mmcif_irqt, if host->wait_for == > > > > MMCIF_WAIT_FOR_REQUEST, > > > > host->mrq = NULL. > > > > It is too earlier to access mrq->data before checking host->mrq. it may > > > > cause null pointer access. > > > > > > > > Goda-san, could you check this and refine the code of sh_mmcif_irqt? > > > > > > Thanks for your report and a fix. Could you please double-check, whether > > > the below patch also fixes your problem? Since such spurious interrupts > > > are possible I would commit a check like this one, but in the longer run > > > we want to identify and eliminate them, if possible. But since so far > > > these interrupts only happen on 1 board model and also not on all units > > > and not upon each boot, this could be a bit tricky. > > > > > > One more question - is this only needed for 3.7 or also for 3.6 / stable? > > > > > > Thanks > > > Guennadi > > > > > > drivers/mmc/host/sh_mmcif.c | 4 ++-- > > > 1 files changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/mmc/host/sh_mmcif.c b/drivers/mmc/host/sh_mmcif.c > > > index 5d81427..82bf921 100644 > > > --- a/drivers/mmc/host/sh_mmcif.c > > > +++ b/drivers/mmc/host/sh_mmcif.c > > > @@ -1104,7 +1104,6 @@ static irqreturn_t sh_mmcif_irqt(int irq, void > > > *dev_id) > > > { > > > struct sh_mmcif_host *host = dev_id; > > > struct mmc_request *mrq = host->mrq; > > > - struct mmc_data *data = mrq->data; > > > > > > cancel_delayed_work_sync(&host->timeout_work); > > > > > > @@ -1152,13 +1151,14 @@ static irqreturn_t sh_mmcif_irqt(int irq, void > > > *dev_id) > > > case MMCIF_WAIT_FOR_READ_END: > > > case MMCIF_WAIT_FOR_WRITE_END: > > > if (host->sd_error) > > > - data->error = sh_mmcif_error_manage(host); > > > + mrq->data->error = sh_mmcif_error_manage(host); > > > break; > > > default: > > > BUG(); > > > } > > > > > > if (host->wait_for != MMCIF_WAIT_FOR_STOP) { > > > + struct mmc_data *data = mrq->data; > > > if (!mrq->cmd->error && data && !data->error) > > > data->bytes_xfered = > > > data->blocks * data->blksz; > > > > > > > I tried this patch. It seems better. > > But I think this still have potential race condition. > > I am afraid that one cpu enter sh_mmcif_irqt and other cpu write to > > host->wait_for for new request at the same time. > > How about add this code at the top of sh_mmcif_irqt or before returning > > IRQ_WAKE_THREAD in sh_mmcif_intr ? > > > > if (host->state == STATE_IDLE) > > return IRQ_HANDLED; > > > > I will rebase my test environment to v3.6-rc3 or later. Then I will > > send you my .config. > > > How is this? > I hope this fixed in v3.6. Sorry, I still haven't come round to looking at this. I think, one thing could halp, if you could try to find out what exactly those spurious interrupts look like, i.e., what's their interrupt status? You could try to print like diff -u a/drivers/mmc/host/sh_mmcif.c b/drivers/mmc/host/sh_mmcif.c --- a/drivers/mmc/host/sh_mmcif.c +++ b/drivers/mmc/host/sh_mmcif.c @@ -1229,6 +1229,10 @@ host->sd_error = true; dev_dbg(&host->pd->dev, "int err state = %08x\n", state); } + if (host->state == STATE_IDLE) { + dev_info(&host->pd->dev, "Spurious IRQ status 0x%x", state); + return IRQ_HANDLED; + } if (state & ~(INT_CMD12RBE | INT_CMD12CRE)) { if (!host->dma_active) return IRQ_WAKE_THREAD; Please, let me know, if it's not very easy for you ATM to perform the test, I'll try to do it myself then. Thanks Guennadi --- Guennadi Liakhovetski, Ph.D. Freelance Open-Source Software Developer http://www.open-technology.de/ -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html