Re: mmc: dw_mmc: warning with CONFIG_DMA_API_DEBUG

Jaehoon Chung <jh80.chung@xxxxxxxxxxx> · Thu, 23 Jun 2016 20:39:02 +0900

Hi Shawn,

On 06/21/2016 04:39 PM, Shawn Lin wrote:
> 在 2016/6/21 13:32, Jaehoon Chung 写道:
>> Hi guys,
>>
>> On 06/21/2016 11:31 AM, Shawn Lin wrote:
>>> On 2016/6/21 10:24, Seung-Woo Kim wrote:
>>>> Hello Shawn,
>>>>
>>>>> -----Original Message-----
>>>>> From: Shawn Lin [mailto:shawn.lin@xxxxxxxxxxxxxx]
>>>>> Sent: Tuesday, June 21, 2016 10:52 AM
>>>>> To: Seung-Woo Kim; jh80.chung@xxxxxxxxxxx; ulf.hansson@xxxxxxxxxx; linux-mmc@xxxxxxxxxxxxxxx; linux-
>>>>> kernel@xxxxxxxxxxxxxxx
>>>>> Cc: shawn.lin@xxxxxxxxxxxxxx
>>>>> Subject: Re: mmc: dw_mmc: warning with CONFIG_DMA_API_DEBUG
>>>>>
>>>>> On 2016/6/20 16:34, Seung-Woo Kim wrote:
>>>>>> Hi folks,
>>>>>>
>>>>>> During booting test on my Exynos5422 based Odroid-XU3, kernel compiled
>>>>>> with CONFIG_DMA_API_DEBUG reported following warning:
>>>>>>
>>>>>> ------------[ cut here ]------------
>>>>>> WARNING: CPU: 0 PID: 0 at lib/dma-debug.c:1096 check_unmap+0x7bc/0xb38
>>>>>> dwmmc_exynos 12200000.mmc: DMA-API: device driver tries to free DMA memory it has not allocated [device
>>>>> address=0x000000006d9d2200]
>>>>>
>>>>> Thanks for this report and fix.
>>>>>
>>>>> DTO(the same as IDMAC-RI/TI) interrupts may or may not come together
>>>>> with DATA_ERR. If DATA_ERR occur without geting DTO, we should issue
>>>>> CMD12 manually to generate DTO. It's a ugly deisgn for dwmmc but from
>>>>> the vendor's ask.
>>>>>
>>>>> So you should never think we complete the xfer without
>>>>> checking DATA_ERR. This way you got the warning.
>>
>> Well, EVENT_DATA_ERR is already checked in tasklet_func..and cleared that flags.
> 
> From my view, the reality is that when we got DATA_ERROR interrupts,
> we set EVENT_DATA_ERR to the pending_events and schedule the tasklet
> but we may still fallback to the IDMAC interrupt case as the tasklet
> may come up a little late, namely right after the IDMAC interrupt checking.
> 
> I'm trying to add some log there, and it well proves my guess.

You're right..This is appeared because of "Data Over".
If Data Over interrupt is occurred, SW needs to read the remaining Data in FIFO.
At that time, it was set to DATA_COMPLETE. Because SW might read the remaining data, but already set to ERROR_DATA.

In this case, Your suggestion may prevent to free twice.

There is other case..during tuning sequence.. :(
I found that it also appeared during the tuning sequence.
- Really..stupid design..

My suggestions are 
First, apply the below solution.
And then consider the HS200 tuning block with the below patch.

https://patchwork.kernel.org/patch/8935791/

How about?
Do you have any other opinion?

Best Regards,
Jaehoon Chung

> 
>>
>>>>>
>>>>> So could you try this one:
>>>>
>>>> With your patch, there is no more the DMA API waring in my environment.
>>>
>>> Nice to hear that.  Thanks for testing, Seung-Woo.
>>
>> Really? It's not solution..When send tuning command, it should be returned CRC error.
>> Then it called the dw_mci_stop_dma() and also dma_ops->complete().
> 
> Hrmm.. I can't see the reason it will also call dma_ops->complete.
> Could you explain a bit more here? :)
> 
> From V2.70a Table 3-2
> For MMC CMD19, there may be no CRC status returned by the
> card but EBE is generated. Hence, EBE is set for CMD19. The application
> should not treat this as an error.
> 
> 
>>
>> When i applied you suggestion, also produced.. :)
>>
>> [    2.469916] [<c010d6f0>] (unwind_backtrace) from [<c010a478>] (show_stack+0x10/0x14)
>> [    2.469934] [<c010a478>] (show_stack) from [<c031b324>] (dump_stack+0x74/0x94)
>> [    2.469949] [<c031b324>] (dump_stack) from [<c011b5ac>] (__warn+0xd4/0x100)
>> [    2.469961] [<c011b5ac>] (__warn) from [<c011b678>] (warn_slowpath_fmt+0x38/0x48)
>> [    2.469975] [<c011b678>] (warn_slowpath_fmt) from [<c033ee20>] (check_unmap+0x828/0x8a8)
>> [    2.469991] [<c033ee20>] (check_unmap) from [<c033f7c0>] (debug_dma_unmap_sg+0x5c/0x13c)
>> [    2.470012] [<c033f7c0>] (debug_dma_unmap_sg) from [<c051e6a4>] (dw_mci_dma_cleanup+0x68/0xa4)
>> [    2.470029] [<c051e6a4>] (dw_mci_dma_cleanup) from [<c0553bf4>] (dw_mci_stop_dma+0x30/0x40)
>> [    2.470045] [<c0553bf4>] (dw_mci_stop_dma) from [<c0521b30>] (dw_mci_tasklet_func+0x340/0x3b4)
>> [    2.470063] [<c0521b30>] (dw_mci_tasklet_func) from [<c011fcb0>] (tasklet_action+0x84/0x12c)
>> [    2.470076] [<c011fcb0>] (tasklet_action) from [<c011f35c>] (__do_softirq+0xec/0x244)
>> [    2.470089] [<c011f35c>] (__do_softirq) from [<c011f750>] (irq_exit+0xb4/0xf8)
>> [    2.470109] [<c011f750>] (irq_exit) from [<c015d004>] (__handle_domain_irq+0x70/0xe4)
>> [    2.470123] [<c015d004>] (__handle_domain_irq) from [<c0101470>] (gic_handle_irq+0x50/0x9c)
>> [    2.470135] [<c0101470>] (gic_handle_irq) from [<c010af14>] (__irq_svc+0x54/0x90)
>> [    2.470141] Exception stack(0xee3d3c58 to 0xee3d3ca0)
>> [    2.470148] 3c40:                                                       c0b1ef0c 0000000a
>> [    2.470159] 3c60: 00000001 0000005c 0000016d 00000000 c0b4f7a4 00000006 00000000 0000005c
>> [    2.470170] 3c80: 00000006 c0b671e8 60000013 ee3d3ca8 c0398e08 c015b830 60000013 ffffffff
>> [    2.470183] [<c010af14>] (__irq_svc) from [<c015b830>] (console_unlock+0x560/0x628)
>> [    2.470196] [<c015b830>] (console_unlock) from [<c015c1b0>] (vprintk_emit+0x1fc/0x508)
>> [    2.470211] [<c015c1b0>] (vprintk_emit) from [<c03ed21c>] (dev_vprintk_emit+0xf8/0x198)
>> [    2.470224] [<c03ed21c>] (dev_vprintk_emit) from [<c03ed2d8>] (dev_printk_emit+0x1c/0x2c)
>> [    2.470235] [<c03ed2d8>] (dev_printk_emit) from [<c03ed334>] (__dev_printk+0x4c/0x70)
>> [    2.470246] [<c03ed334>] (__dev_printk) from [<c03ed390>] (_dev_info+0x38/0x48)
>> [    2.470258] [<c03ed390>] (_dev_info) from [<c045a7a4>] (usb_new_device+0xe8/0x3d0)
>> [    2.470270] [<c045a7a4>] (usb_new_device) from [<c045bc20>] (hub_event+0x760/0xff4)
>> [    2.470289] [<c045bc20>] (hub_event) from [<c012ff68>] (process_one_work+0x120/0x328)
>> [    2.470304] [<c012ff68>] (process_one_work) from [<c01301d4>] (worker_thread+0x2c/0x4ac)
>> [    2.470321] [<c01301d4>] (worker_thread) from [<c01353a4>] (kthread+0xd8/0xf4)
>> [    2.470335] [<c01353a4>] (kthread) from [<c0107978
>>
>> I'm checking this..
>>
>> Best Regards,
>> Jaehoon Chung
>>
>>>
>>> Hi Jaehoon,
>>>
>>> How about this?
>>>
>>>>
>>>> Best Regards,
>>>> - Seung-Woo Kim
>>>>
>>>>>
>>>>> --- a/drivers/mmc/host/dw_mmc.c
>>>>> +++ b/drivers/mmc/host/dw_mmc.c
>>>>> @@ -2474,7 +2474,8 @@ static irqreturn_t dw_mci_interrupt(int irq, void
>>>>> *dev_id)
>>>>>                          mci_writel(host, IDSTS64, SDMMC_IDMAC_INT_TI |
>>>>>
>>>>> SDMMC_IDMAC_INT_RI);
>>>>>                          mci_writel(host, IDSTS64, SDMMC_IDMAC_INT_NI);
>>>>> -                       host->dma_ops->complete((void *)host);
>>>>> +                       if (!test_bit(EVENT_DATA_ERROR,
>>>>> &host->pending_events))
>>>>> +                               host->dma_ops->complete((void *)host);
>>>>>                  }
>>>>>          } else {
>>>>>                  pending = mci_readl(host, IDSTS);
>>>>> @@ -2482,7 +2483,8 @@ static irqreturn_t dw_mci_interrupt(int irq, void
>>>>> *dev_id)
>>>>>                          mci_writel(host, IDSTS, SDMMC_IDMAC_INT_TI |
>>>>>
>>>>> SDMMC_IDMAC_INT_RI);
>>>>>                          mci_writel(host, IDSTS, SDMMC_IDMAC_INT_NI);
>>>>> -                       host->dma_ops->complete((void *)host);
>>>>> +                       if (!test_bit(EVENT_DATA_ERROR,
>>>>> &host->pending_events))
>>>>> +                               host->dma_ops->complete((void *)host);
>>>>>                  }
>>>>>          }
>>>>>
>>>>>
>>>>>> [size=128 bytes]
>>>>>> Modules linked in:
>>>>>> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.7.0-rc4 #26
>>>>>> Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
>>>>>> [<c0112b4c>] (unwind_backtrace) from [<c010d888>] (show_stack+0x20/0x24)
>>>>>> [<c010d888>] (show_stack) from [<c03fab0c>] (dump_stack+0x80/0x94)
>>>>>> [<c03fab0c>] (dump_stack) from [<c0123548>] (__warn+0xf8/0x110)
>>>>>> [<c0123548>] (__warn) from [<c01235a8>] (warn_slowpath_fmt+0x48/0x50)
>>>>>> [<c01235a8>] (warn_slowpath_fmt) from [<c042ac90>] (check_unmap+0x7bc/0xb38)
>>>>>> [<c042ac90>] (check_unmap) from [<c042b25c>] (debug_dma_unmap_sg+0x118/0x148)
>>>>>> [<c042b25c>] (debug_dma_unmap_sg) from [<c077512c>] (dw_mci_dma_cleanup+0x7c/0xb8)
>>>>>> [<c077512c>] (dw_mci_dma_cleanup) from [<c0773f24>] (dw_mci_stop_dma+0x40/0x50)
>>>>>> [<c0773f24>] (dw_mci_stop_dma) from [<c0777d04>] (dw_mci_tasklet_func+0x130/0x3b4)
>>>>>> [<c0777d04>] (dw_mci_tasklet_func) from [<c0129760>] (tasklet_action+0xb4/0x150)
>>>>>> [<c0129760>] (tasklet_action) from [<c0101674>] (__do_softirq+0xe4/0x3cc)
>>>>>> [<c0101674>] (__do_softirq) from [<c0129030>] (irq_exit+0xd0/0x10c)
>>>>>> [<c0129030>] (irq_exit) from [<c01778a0>] (__handle_domain_irq+0x90/0xfc)
>>>>>> [<c01778a0>] (__handle_domain_irq) from [<c0101548>] (gic_handle_irq+0x64/0xa8)
>>>>>> [<c0101548>] (gic_handle_irq) from [<c010e3d4>] (__irq_svc+0x54/0x90)
>>>>>> Exception stack(0xc1101ef8 to 0xc1101f40)
>>>>>> 1ee0:                                                       00000001 00000000
>>>>>> 1f00: 00000000 c011b600 c1100000 c110753c 00000000 c11c3984 c11074d4 c1107548
>>>>>> 1f20: 00000000 c1101f54 c1101f58 c1101f48 c010a1fc c010a200 60000013 ffffffff
>>>>>> [<c010e3d4>] (__irq_svc) from [<c010a200>] (arch_cpu_idle+0x48/0x4c)
>>>>>> [<c010a200>] (arch_cpu_idle) from [<c01669d8>] (default_idle_call+0x30/0x3c)
>>>>>> [<c01669d8>] (default_idle_call) from [<c0166d3c>] (cpu_startup_entry+0x358/0x3b4)
>>>>>> [<c0166d3c>] (cpu_startup_entry) from [<c0aa6ab8>] (rest_init+0x94/0x98)
>>>>>> [<c0aa6ab8>] (rest_init) from [<c1000d58>] (start_kernel+0x3a4/0x3b0)
>>>>>> [<c1000d58>] (start_kernel) from [<4000807c>] (0x4000807c)
>>>>>> ---[ end trace 256f83eed365daf0 ]---
>>>>>>
>>>>>> The warning occurs because after complete callback function,
>>>>>> dw_mci_dmac_complete_dma() is called, then dw_mci_stop_dma() is called
>>>>>> again. So it causes dma_unmap_sg() is called twice for same sg. It
>>>>>> occurs during clock setting at booting time.
>>>>>>
>>>>>> Simply, clearing host->using_dma flag on dw_mci_dmac_complete_dma() and
>>>>>> dw_mci_stop_dma() like following fixes the issue, but I am not sure
>>>>>> this approach is proper.
>>>>>> ---
>>>>>> diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
>>>>>> index 2cc6123..a71c94b 100644
>>>>>> --- a/drivers/mmc/host/dw_mmc.c
>>>>>> +++ b/drivers/mmc/host/dw_mmc.c
>>>>>> @@ -388,6 +388,7 @@ static void dw_mci_stop_dma(struct dw_mci *host)
>>>>>>      if (host->using_dma) {
>>>>>>          host->dma_ops->stop(host);
>>>>>>          host->dma_ops->cleanup(host);
>>>>>> +        host->using_dma = 0;
>>>>>>      }
>>>>>>
>>>>>>      /* Data transfer was stopped by the interrupt handler */
>>>>>> @@ -455,6 +456,7 @@ static void dw_mci_dmac_complete_dma(void *arg)
>>>>>>                      DMA_FROM_DEVICE);
>>>>>>
>>>>>>      host->dma_ops->cleanup(host);
>>>>>> +    host->using_dma = 0;
>>>>>>
>>>>>>      /*
>>>>>>       * If the card was removed, data will be NULL. No point in trying to
>>>>>> @@ -943,8 +945,6 @@ static int dw_mci_submit_data_dma(struct dw_mci *host, struct mmc_data *data)
>>>>>>      int sg_len;
>>>>>>      u32 temp;
>>>>>>
>>>>>> -    host->using_dma = 0;
>>>>>> -
>>>>>>      /* If we don't have a channel, we can't do DMA */
>>>>>>      if (!host->use_dma)
>>>>>>          return -ENODEV;
>>>>>> ---
>>>>>>
>>>>>> Best Regards,
>>>>>> - Seung-Woo Kim
>>>>>>
>>>>>> -- 
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> Best Regards
>>>>> Shawn Lin
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>>
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html