Re: [BUG] mmc: dw_mmc*: mmc2: cache flush error -110 hang

Ulf Hansson <ulf.hansson@xxxxxxxxxx> · Fri, 26 Oct 2018 10:25:36 +0200

On 25 October 2018 at 23:20, Hal Emmerich <hal@xxxxxxxxxxxxxxx> wrote:
> Hello mmc people,
>
> When booting the veyron speedy, which uses the dw_mmc driver on kernel 4.19 it hangs for ~10 minutes about 1 in 10 boots.
> This also occurs on kernel version 4.17.2.

Do you know if this has been a problem always or is it a regression?
That would be very nice to know.

> Tracing the hang:
> if the mmc block system fails to read a sector, mmc_blk_rq_error is called, which calls hw_reset,
> which calls _mmc_hw_reset in /drivers/mmc/core/mmc.c,
> which finally calls
> mmc_flush_cache(host->card) which hangs for ~10 minutes, before failing and resetting the emmc.
>
> If the call to mmc_flush_cache(host->card) is commented out, the hang no longer happens.

Well, honestly the call to mmc_flush_cache() can be discussed. I
wonder if it ever have work, without errors. The reason to why I think
so, is simply because the card is in an unknown state - likely not
being able to accept a flush request anyway.

On the other hand, hanging for ~10 minutes sounds like a
controller/driver problem, this should not happen, no matter what.

>
>
> The errors printed after it finally recovers are:
> [  602.188052] mmc2: cache flush error -110
> [  602.690672] dwmmc_rockchip ff0f0000.dwmmc: Busy; trying anyway
> [  603.193323] mmc_host mmc2: Timeout sending command (cmd 0x202000 arg 0x0 status 0x80202000)
>
> The first is printed by mmc_flush_cache, and the second two are from the second half of __mmc_hw_reset,
> when it re inits the emmc.
>
> Could this be due to incorrect clocks?

Perhaps. Or that the driver/controller is in some error state, after
the failed I/O request, which means that it fails to serve any request
properly.

There is a couple of things I would have tried.

1. Try using the mmc_test driver and verify that the hw_reset test
works. This means you will be running the test, when the
controller/card are in good conditions.

2. If 1) works, repeat the failure sequence you described above (don't
use mmc_test no more), but replace mmc_flush_cache() in
_mmc_hw_reset() with some other commands (try both R1 and R1B
responses) and see what happens. None of the commands should hang.

This should tell us more.

Kind regards
Uffe