Re: [PATCH 0/6] mmc: block: command issue cleanups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 27, 2017 at 8:58 AM, Ulf Hansson <ulf.hansson@xxxxxxxxxx> wrote:

>>> Linus Walleij (6):
>>>   mmc: block: break out mmc_blk_rw_cmd_abort()
>>>   mmc: block: break out mmc_blk_rw_start_new()
>>>   mmc: block: do not assign mq_rq when aborting command
>>>   mmc: block: inline command abortions
>>>   mmc: block: introduce new_areq and old_areq
>>>   mmc: block: stop passing around pointless return values
(...)
> Seems like this series may have issues. I have looked at boot reports
> from kernelci, and particular the reports for
> https://kernelci.org/boot/sun7i-a20-bananapi/job/ulfh/ are
> interesting.
>
> Apparently, this board has an SD card attached. There have been errors
> reported in the log for a while when doing data transfers, although
> none of these errors have triggered the kernelci to report a boot
> error.

Damned I wish I could be hands-on with this system and bisect
it. It's very helpful with shaky systems really. Sadly the errors are
hard to reproduce :(

The old errors look like so:

[    6.099124] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD SBE !!
[    6.105211] sunxi-mmc 1c0f000.mmc: data error, sending stop command
[    6.122394] mmcblk0: timed out sending r/w cmd command, card status 0x900
[    6.665013] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD DTO !!
[    6.671011] sunxi-mmc 1c0f000.mmc: data error, sending stop command
[    6.677812] mmcblk0: timed out sending r/w cmd command, card status 0x900
[    7.123727] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD DTO !!
[    7.129692] sunxi-mmc 1c0f000.mmc: data error, sending stop command
[    7.136489] mmcblk0: timed out sending r/w cmd command, card status 0x900
[    7.143349] blk_update_request: I/O error, dev mmcblk0, sector 124800
[    7.493691] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD DTO !!
[    7.499651] sunxi-mmc 1c0f000.mmc: data error, sending stop command
[    7.506229] mmcblk0: timed out sending r/w cmd command, card status 0x900
[    7.943641] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD DTO !!
[    7.949595] sunxi-mmc 1c0f000.mmc: data error, sending stop command
[    7.956222] mmcblk0: timed out sending r/w cmd command, card status 0x900
[    7.963010] blk_update_request: I/O error, dev mmcblk0, sector 124800
[    7.969499] Buffer I/O error on dev mmcblk0p1, logical block 15344,
async page read
[    8.321411] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD DTO !!
[    8.327378] sunxi-mmc 1c0f000.mmc: data error, sending stop command
[    8.334018] mmcblk0: timed out sending r/w cmd command, card status 0x900
[    8.763338] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD DTO !!
[    8.769276] sunxi-mmc 1c0f000.mmc: data error, sending stop command
[    8.775960] mmcblk0: timed out sending r/w cmd command, card status 0x900
[    8.782750] blk_update_request: I/O error, dev mmcblk0, sector 124928
[    9.125126] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD DTO !!
[    9.131084] sunxi-mmc 1c0f000.mmc: data error, sending stop command
[    9.137624] mmcblk0: timed out sending r/w cmd command, card status 0x900
[    9.144445] blk_update_request: I/O error, dev mmcblk0, sector 124928
[    9.150881] Buffer I/O error on dev mmcblk0p2, logical block 0,
async page read

So something was causing errors on the read command.

> However, I suspect that some of the changes in this series make it
> worse. Perhaps because of a changed error handling the mmc block
> layer!?
>
> Particular, look at the difference between these [1] boot logs, it
> might give you some hints. I have also added Maxime to this thread,
> perhaps he can help out with the sunxi mmc driver.

The new errors look like so:

[    6.099171] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD SBE !!
[    6.105259] sunxi-mmc 1c0f000.mmc: data error, sending stop command
[    6.127415] mmcblk0: timed out sending r/w cmd command, card status 0x900
[    6.666628] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD DTO !!
[    6.672626] sunxi-mmc 1c0f000.mmc: data error, sending stop command
[    6.679420] mmcblk0: timed out sending r/w cmd command, card status 0x900
[    7.503256] sunxi-mmc 1c0f000.mmc: fatal err update clk timeout
[    8.623257] sunxi-mmc 1c0f000.mmc: fatal err update clk timeout

This "fatal err update clk timeout" is new and is coming from the driver.

[    8.630370] mmc0: tried to reset card, got error -5
[    8.635309] blk_update_request: I/O error, dev mmcblk0, sector 124800
[    8.642366] mmcblk0: error -5 sending status command, retrying
[    8.648279] mmcblk0: error -5 sending status command, retrying
[    8.654132] mmcblk0: error -5 sending status command, aborting
[    8.659961] blk_update_request: I/O error, dev mmcblk0, sector 7167872
[    8.667201] mmcblk0: error -5 sending status command, retrying
[    8.673031] mmcblk0: error -5 sending status command, retrying
[    8.678916] mmcblk0: error -5 sending status command, aborting
[    8.684758] blk_update_request: I/O error, dev mmcblk0, sector 124800
[    8.691195] Buffer I/O error on dev mmcblk0p1, logical block 15344,
async page read
[    8.700492] mmcblk0: error -5 sending status command, retrying
[    8.706405] mmcblk0: error -5 sending status command, retrying
[    8.712232] mmcblk0: error -5 sending status command, aborting
[    8.718103] blk_update_request: I/O error, dev mmcblk0, sector 7167872
[    8.724643] Buffer I/O error on dev mmcblk0p2, logical block
880368, async page read
[    8.732403] Unable to handle kernel NULL pointer dereference at
virtual address 00000028
[    8.740501] pgd = c0004000
[    8.743204] [00000028] *pgd=00000000
[    8.746794] Internal error: Oops: 17 [#1] SMP ARM
[    8.751491] Modules linked in:
[    8.754547] CPU: 0 PID: 65 Comm: mmcqd/0 Not tainted
4.10.0-rc5-00097-ge1defa2da4d3 #1
[    8.762450] Hardware name: Allwinner sun7i (A20) Family
[    8.767666] task: ee9b0fc0 task.stack: ee9d8000
[    8.772201] PC is at mmc_blk_rw_rq_prep+0x20/0x39c
[    8.776985] LR is at mmc_blk_issue_rw_rq+0x124/0x370

And now it is crashing at mmc_blk_rw_rq_prep() + 0x20 so I suspect it is one of
these:

struct mmc_blk_request *brq = &mqrq->brq;
struct request *req = mqrq->req;
struct mmc_blk_data *md = mq->blkdata;

I guess the first: mqrq is NULL.

So I suspect the oneliner in commit 0ebd6e72b5ee2592625d5ae567a729345dfe07b6
"mmc: block: do not assign mq_rq when aborting command"

That is not easily reverted so I will sent a RTF patch to restore the behaviour.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux