Hi Doug, ? 2017/4/11 7:35, Doug Anderson ??: > Hi, > > On Wed, Mar 29, 2017 at 6:17 PM, Brian Norris <briannorris at chromium.org> wrote: >> Hi all, >> >> I haven't managed to get as far as a bugfix for this, but I've bisected >> some issues seen on v4.10+ with a Chromebook of the Veyron family (Jaq, >> in particular). v4.9 works fine. > > OK, I finally got everything up and running to test this too... > > >> Issue #1 - eMMC complains periodically: >> >> [ 4.358135] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >> [ 4.461466] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >> [ 5.291450] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >> [ 5.381471] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >> [ 11.243337] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >> [ 17.371628] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) > > I don't believe that this is an error, actually. Really we just need > to quiet this message or (since I've always found it useful) move it > to a different place. I believe that with runtime PM we're > effectively turning the clock off whenever the MMC device isn't in > use. On dw_mmc we have a "helpful" printout every time the clock is > changed, and that's what you're seeing here. > > You can see with: > > while true; do > dd if=/dev/mmcblk2 of=/dev/null bs=512 count=1 iflag=direct; > sleep .1; > done > > ...that you'll get a printout every 100ms. > > > Ah, looks like this is in: > > ce69e2fea093 mmc: dw_mmc: silent verbose log when calling from PM context > > ...as pointed out by Shawn Lin. So I think in 4.10 we can just ignore > those messages and they're good on 4.11. > > >> and if I stress it out at all (e.g., dd if=/dev/mmcblk2 bs=1M > >> /dev/null), it will eventually croak: >> >> [ 359.916315] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >> [ 360.071378] dwmmc_rockchip ff0f0000.dwmmc: Successfully tuned phase to 153 >> [ 360.211351] mmcblk2: error -110 transferring data, sector 8644608, nr 2048, cmd response 0x900, card status 0x0 >> [ 360.221936] mmcblk2: retrying using single block read >> [ 363.491362] mmcblk2: error -110 transferring data, sector 8646656, nr 2048, cmd response 0x900, card status 0x0 >> [ 363.531569] mmc_host mmc2: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0) >> [ 363.596326] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >> [ 363.612712] dwmmc_rockchip ff0f0000.dwmmc: Successfully tuned phase to 152 >> [ 363.751351] mmcblk2: error -110 transferring data, sector 8646656, nr 2048, cmd response 0x900, card status 0x0 >> [ 363.761938] mmcblk2: retrying using single block read >> [ 366.611356] INFO: task mmcqd/2boot1:92 blocked for more than 120 seconds. >> [ 366.618134] Not tainted 4.10.0 #284 >> [ 366.622146] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> [ 366.629960] mmcqd/2boot1 D 0 92 2 0x00000000 >> [ 366.635454] [<c07dc21c>] (__schedule) from [<c07dc4e0>] (schedule+0x90/0xa0) >> [ 366.642497] [<c07dc4e0>] (schedule) from [<c066e8b4>] (__mmc_claim_host+0xd4/0x19c) >> [ 366.650142] [<c066e8b4>] (__mmc_claim_host) from [<c066e9ac>] (mmc_get_card+0x30/0x34) >> [ 366.658056] [<c066e9ac>] (mmc_get_card) from [<c067fc8c>] (mmc_blk_issue_rq+0x64/0x48c) >> [ 366.666052] [<c067fc8c>] (mmc_blk_issue_rq) from [<c0680230>] (mmc_queue_thread+0x114/0x1b4) >> [ 366.674484] [<c0680230>] (mmc_queue_thread) from [<c023d1b0>] (kthread+0x128/0x144) >> [ 366.682134] [<c023d1b0>] (kthread) from [<c02076e8>] (ret_from_fork+0x14/0x2c) > > I'm not convinced this is a regression. > > I remember Heiko saying that he's heard reports that on some boards > eMMC doesn't work with high speed, and I'be believe that's what you're > seeing here. It would be interesting to try to debug this. I can't > personally reproduce, though. > > I think veyron_minnie already has UHS turned off for eMMC upstream. I > guess we could do it for other veyron boards too until someone can > debug? > > >> Issue #2 - Wifi (via SDIO, mmc1) is completely dead: >> >> [ 1.444125] mmc_host mmc1: card is non-removable. >> [ 1.471368] mmc_host mmc1: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0) >> [ 1.619553] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >> [ 1.881699] mmc1: new ultra high speed SDR104 SDIO card at address 0001 >> [ 25.681172] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >> [ 25.691666] mwifiex: rx work enabled, cpus 4 >> [ 26.827000] mwifiex_sdio mmc1:0001:1: info: FW download over, size 800344 bytes >> [ 27.561352] mwifiex_sdio mmc1:0001:1: WLAN FW is active >> [ 33.585165] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >> [ 37.651344] mwifiex_sdio mmc1:0001:1: mwifiex_cmd_timeout_func: Timeout cmd id = 0xa9, act = 0x0 >> [ 37.660122] mwifiex_sdio mmc1:0001:1: num_data_h2c_failure = 0 >> [ 37.665951] mwifiex_sdio mmc1:0001:1: num_cmd_h2c_failure = 0 >> [ 37.671688] mwifiex_sdio mmc1:0001:1: is_cmd_timedout = 1 >> [ 37.677076] mwifiex_sdio mmc1:0001:1: num_tx_timeout = 0 >> [ 37.682380] mwifiex_sdio mmc1:0001:1: last_cmd_index = 1 >> [ 37.687681] mwifiex_sdio mmc1:0001:1: last_cmd_id: 00 00 a9 00 00 00 00 00 00 00 >> [ 37.695066] mwifiex_sdio mmc1:0001:1: last_cmd_act: 00 00 00 00 00 00 00 00 00 00 >> [ 37.702536] mwifiex_sdio mmc1:0001:1: last_cmd_resp_index = 0 >> [ 37.708269] mwifiex_sdio mmc1:0001:1: last_cmd_resp_id: 00 00 00 00 00 00 00 00 00 00 >> [ 37.716087] mwifiex_sdio mmc1:0001:1: last_event_index = 0 >> [ 37.721564] mwifiex_sdio mmc1:0001:1: last_event: 00 00 00 00 00 00 00 00 00 00 >> [ 37.728857] mwifiex_sdio mmc1:0001:1: data_sent=1 cmd_sent=0 >> [ 37.734508] mwifiex_sdio mmc1:0001:1: ps_mode=0 ps_state=0 >> [ 37.740016] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >> [ 37.750268] mwifiex_sdio mmc1:0001:1: info: mwifiex_fw_dpc: unregister device > > This doesn't surprise me at all. What surprises me, though, is that > nobody else seems to be able to reproduce this. > > On veyron, WiFi is connected via SDIO. For good speed, it uses SDIO > Interrupts. See this bit in the device tree: > > cap-sdio-irq; > all of *my* boards are using side-band interrupt, so there are no "cap-sdio-irq". > SDIO interrupts (in 4-bit mode) specifically need the card clock to be > running all the time to work. I can reproduce your regression (on > veryron-jerry, which also has Marvell WiFi) and I can also find that > the regression is "gone' if I take out the "cap-sdio-irq" in the > veyron device tree. Ah, interestingly enough, turning off SDIO > interrupts has the side effect of sending enough (polling) traffic > that we never seem to runtime suspend, either. ;-P > > In general I'd question whether dw_mmc actually gets much power > benefit from Runtime PM in Linux. The dw_mmc IP blocks already have a > feature in them to automatically stop and restart the card clock. See > SDMMC_CLKEN_LOW_PWR. Maybe you're getting the benefit of turning off > VMMC or VQMMC? Is that really a lot of power? Presumably those power > savings would be for eMMC or normal SD cards (not SDIO). > > Maybe someone else on this thread knows how Runtime PM is supposed to > work in general for SDIO? I notice that in sdio.c the > mmc_sdio_runtime_suspend() unconditionally calls mmc_power_off(). > That seems odd since the main mmc_sdio_suspend() _doesn't_ call it if > mmc_card_keep_power(). Hrmmm... > > OK, so I just tried this on veyron-minnie. On minnie we have Broadcom > WiFi. That actually works (!). Presumably this is because > brcmf_sdiod_host_fixup() calls pm_runtime_forbid(). Commenting that > out breaks things. > > OK, and I can make Marvell work by adding > "pm_runtime_forbid(func->card->host->parent);" to the end of > mwifiex_sdio_probe(). > > -- > > So where does that leave us? > > A) Technically we can fix Marvell's driver to work like Broadcom's. > One could possibly assert that this is the wrong fix because > technically we could make Runtime PM work with SDIO with enough work. > We could theoretically move into 1-bit mode and there (I think) you > can get interrupts with the clock off. ...or we could have a > dedicated SDIO Interrupt pin (for the embedded case), which is talked > about in the SDIO spec. > > B) Technically we could hack this in the dw_mmc code to disable > Runtime PM if we see that an SDIO interrupt is used. One advantage of > doing it here is that if we ever add support in dw_mmc for the > external SDIO interrupt we could allow Runtime PM in that case. In > theory the dw_mmc IP block has some basic support for a dedicated SDIO > interrupt pin, but there's no code to support it. > > C) Technically we could add this into the MMC core. > > D) Technically we could remove Runtime PM support from dw_mmc for now > until someone can address all these issues (and ideally show a real > power savings). > > > I'd tend to vote for D, but I've been pretty absent from dw_mmc for a > long time, so probably my vote isn't worth that much... > > Shawn: I think you actually enabled runtime PM. Did you really see > power savings, or did it just seem like enabling Runtime PM would be a > neat thing to do? As Ulf pointed out that the genpd for mmc IP on Rockchip platforms were shared with others, so it's worth to add runtime PM. > > > -Doug > > >