On 25 October 2018 at 23:20, Hal Emmerich <hal@xxxxxxxxxxxxxxx> wrote: > Hello mmc people, > > When booting the veyron speedy, which uses the dw_mmc driver on kernel 4.19 it hangs for ~10 minutes about 1 in 10 boots. > This also occurs on kernel version 4.17.2. Do you know if this has been a problem always or is it a regression? That would be very nice to know. > Tracing the hang: > if the mmc block system fails to read a sector, mmc_blk_rq_error is called, which calls hw_reset, > which calls _mmc_hw_reset in /drivers/mmc/core/mmc.c, > which finally calls > mmc_flush_cache(host->card) which hangs for ~10 minutes, before failing and resetting the emmc. > > If the call to mmc_flush_cache(host->card) is commented out, the hang no longer happens. Well, honestly the call to mmc_flush_cache() can be discussed. I wonder if it ever have work, without errors. The reason to why I think so, is simply because the card is in an unknown state - likely not being able to accept a flush request anyway. On the other hand, hanging for ~10 minutes sounds like a controller/driver problem, this should not happen, no matter what. > > > The errors printed after it finally recovers are: > [ 602.188052] mmc2: cache flush error -110 > [ 602.690672] dwmmc_rockchip ff0f0000.dwmmc: Busy; trying anyway > [ 603.193323] mmc_host mmc2: Timeout sending command (cmd 0x202000 arg 0x0 status 0x80202000) > > The first is printed by mmc_flush_cache, and the second two are from the second half of __mmc_hw_reset, > when it re inits the emmc. > > Could this be due to incorrect clocks? Perhaps. Or that the driver/controller is in some error state, after the failed I/O request, which means that it fails to serve any request properly. There is a couple of things I would have tried. 1. Try using the mmc_test driver and verify that the hw_reset test works. This means you will be running the test, when the controller/card are in good conditions. 2. If 1) works, repeat the failure sequence you described above (don't use mmc_test no more), but replace mmc_flush_cache() in _mmc_hw_reset() with some other commands (try both R1 and R1B responses) and see what happens. None of the commands should hang. This should tell us more. Kind regards Uffe