On 09/08/17 10:57, Bough Chen wrote: >> -----Original Message----- >> From: linux-mmc-owner@xxxxxxxxxxxxxxx [mailto:linux-mmc- >> owner@xxxxxxxxxxxxxxx] On Behalf Of Adrian Hunter >> Sent: Wednesday, August 09, 2017 1:58 PM >> To: Shawn Lin <shawn.lin@xxxxxxxxxxxxxx>; Bough Chen >> <haibo.chen@xxxxxxx> >> Cc: Ulf Hansson <ulf.hansson@xxxxxxxxxx>; linux-mmc <linux- >> mmc@xxxxxxxxxxxxxxx>; Alex Lemberg <alex.lemberg@xxxxxxxxxxx>; Mateusz >> Nowak <mateusz.nowak@xxxxxxxxx>; Yuliy Izrailov >> <Yuliy.Izrailov@xxxxxxxxxxx>; Jaehoon Chung <jh80.chung@xxxxxxxxxxx>; >> Dong Aisheng <dongas86@xxxxxxxxx>; Das Asutosh >> <asutoshd@xxxxxxxxxxxxxx>; Zhangfei Gao <zhangfei.gao@xxxxxxxxx>; >> Dorfman Konstantin <kdorfman@xxxxxxxxxxxxxx>; Sahitya Tummala >> <stummala@xxxxxxxxxxxxxx>; Harjani Ritesh <riteshh@xxxxxxxxxxxxxx>; Venu >> Byravarasu <vbyravarasu@xxxxxxxxxx>; Linus Walleij <linus.walleij@xxxxxxxxxx> >> Subject: Re: [PATCH V4 09/11] mmc: block: Add CQE support >> >> On 09/08/17 03:55, Shawn Lin wrote: >>> Hi, >>> >>> On 2017/8/8 20:07, Bough Chen wrote: >>>>> -----Original Message----- >>>>> From: Adrian Hunter [mailto:adrian.hunter@xxxxxxxxx] >>>>> Sent: Friday, July 21, 2017 5:50 PM >>>>> To: Ulf Hansson <ulf.hansson@xxxxxxxxxx> >>>>> Cc: linux-mmc <linux-mmc@xxxxxxxxxxxxxxx>; Bough Chen >>>>> <haibo.chen@xxxxxxx>; Alex Lemberg <alex.lemberg@xxxxxxxxxxx>; >>>>> Mateusz Nowak <mateusz.nowak@xxxxxxxxx>; Yuliy Izrailov >>>>> <Yuliy.Izrailov@xxxxxxxxxxx>; Jaehoon Chung >>>>> <jh80.chung@xxxxxxxxxxx>; Dong Aisheng <dongas86@xxxxxxxxx>; Das >>>>> Asutosh <asutoshd@xxxxxxxxxxxxxx>; Zhangfei Gao >>>>> <zhangfei.gao@xxxxxxxxx>; Dorfman Konstantin >>>>> <kdorfman@xxxxxxxxxxxxxx>; David Griego <david.griego@xxxxxxxxxx>; >>>>> Sahitya Tummala <stummala@xxxxxxxxxxxxxx>; Harjani Ritesh >>>>> <riteshh@xxxxxxxxxxxxxx>; Venu Byravarasu <vbyravarasu@xxxxxxxxxx>; >>>>> Linus Walleij <linus.walleij@xxxxxxxxxx>; Shawn Lin >>>>> <shawn.lin@xxxxxxxxxxxxxx> >>>>> Subject: [PATCH V4 09/11] mmc: block: Add CQE support >>>>> >>>>> Add CQE support to the block driver, including: >>>>> - optionally using DCMD for flush requests >>>>> - manually issuing discard requests >>>>> - issuing read / write requests to the CQE >>>>> - supporting block-layer timeouts >>>>> - handling recovery >>>>> - supporting re-tuning >>>>> >>>>> Signed-off-by: Adrian Hunter <adrian.hunter@xxxxxxxxx> >>>>> --- >>>>> drivers/mmc/core/block.c | 195 >> ++++++++++++++++++++++++++++++++- >>>>> drivers/mmc/core/block.h | 7 ++ >>>>> drivers/mmc/core/queue.c | 273 >>>>> ++++++++++++++++++++++++++++++++++++++++++++++- >>>>> drivers/mmc/core/queue.h | 42 +++++++- >>>>> 4 files changed, 510 insertions(+), 7 deletions(-) >>>>> >>>>> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c >>>>> index >>>>> 915290c74363..2d25115637b7 100644 >>>>> --- a/drivers/mmc/core/block.c >>>>> +++ b/drivers/mmc/core/block.c >>>>> @@ -109,6 +109,7 @@ struct mmc_blk_data { >>>>> #define MMC_BLK_WRITE BIT(1) >>>>> #define MMC_BLK_DISCARD BIT(2) >>>>> #define MMC_BLK_SECDISCARD BIT(3) >>>>> +#define MMC_BLK_CQE_RECOVERY BIT(4) >>>>> >>>>> /* >>>>> * Only set in main mmc_blk_data associated @@ -1612,6 >>>>> +1613,198 @@ static void mmc_blk_data_prep(struct mmc_queue *mq, >>>>> struct mmc_queue_req *mqrq, >>>>> *do_data_tag_p = do_data_tag; >>>>> } >>>>> >>>>> +#define MMC_CQE_RETRIES 2 >>> >>> >>>>> + blk_queue_rq_timed_out(mq->queue, mmc_cqe_timed_out); >>>>> + blk_queue_rq_timeout(mq->queue, 60 * HZ); >>>> >>> >>> ------8<------- >>> >>>> Hi Adrian, >>>> >>>> These days I'm doing CMDQ stress test, and find one issue. >>>> On our i.MX8QXP-ARM2 board, the RAM is 3GB. eMMC is 32GB. >>>> I use command 'free -m' get the total memory is 2800M, and the free >>>> memory is 2500M. >>>> >>>> I use 'mkfs.ext4' to format ext4 file system on the eMMC under >>>> HS400ES CMDQ mode, works fine. >>>> >>>> When I use the following command to stress test CMDQ, it works fine. >>>> bonnie++ -d /run/media/mmcblk0p1/ -u 0:0 -s 2048 -r 1024 >>>> >>>> But when I change to use a large file size to do the same stress >>>> test, using >>>> bonnie++ -d /run/media/mmcblk0p1/ -u 0:0 -s 4096 -r 2048 >>>> or >>>> bonnie++ -d /run/media/mmcblk0p1/ -u 0:0 -s 5600 >>>> >>>> I get the following dump message. According to the log, >>>> mmc_cqe_timed_out() was trigged. >>>> Seems mmc was blocked in somewhere. >>>> Then I try to debug this issue, and open MMC_DEBUG in config, do the >>>> same test, print the detail Command sending information on the >>>> console, but finally can't reproduce. >> >> mmc_cqe_timed_out() is a 60 second timeout provided by the block layer. >> Refer "blk_queue_rq_timeout(mq->queue, 60 * HZ)" in mmc_init_queue(). >> 60s is quite a long time so I would first want to determine if the task was really >> queued that long. I would instrument some code into cqhci_request() to >> record the start time on struct mmc_request, and then print the time taken >> when there is a problem. >> > > Hi Adrian, > > According to your suggestion, I add the following code to print the time. > When issue happens, seems the request really pending for over 60s! > > done > Writing intelligently...[ 689.209548] mmc0: cqhci: timeout for tag 9 > [ 689.213658] the mrq all use 62123742 us > [ 689.217487] mmc0: cqhci: ============ CQHCI REGISTER DUMP =========== > [ 689.223927] mmc0: cqhci: Caps: 0x0000310a | Version: 0x00000510 > [ 689.230363] mmc0: cqhci: Config: 0x00001001 | Control: 0x00000000 > [ 689.236800] mmc0: cqhci: Int stat: 0x00000000 | Int enab: 0x00000006 > [ 689.243238] mmc0: cqhci: Int sig: 0x00000006 | Int Coal: 0x00000000 > [ 689.249675] mmc0: cqhci: TDL base: 0x90079000 | TDL up32: 0x00000000 > [ 689.256113] mmc0: cqhci: Doorbell: 0x1fffffff | TCN: 0x00000000 > [ 689.262550] mmc0: cqhci: Dev queue: 0x1fffefff | Dev Pend: 0x1fff7fff > [ 689.268988] mmc0: cqhci: Task clr: 0x00000000 | SSC1: 0x00011000 > [ 689.275425] mmc0: cqhci: SSC2: 0x00000001 | DCMD rsp: 0x00000800 > [ 689.281862] mmc0: cqhci: RED mask: 0xfdf9a080 | TERRI: 0x00000000 > [ 689.288300] mmc0: cqhci: Resp idx: 0x0000002f | Resp arg: 0x00000900 > [ 689.294737] mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== > [ 689.301176] mmc0: sdhci: Sys addr: 0xb602f000 | Version: 0x00000002 > [ 689.307612] mmc0: sdhci: Blk size: 0x00000200 | Blk cnt: 0x00000400 > [ 689.314050] mmc0: sdhci: Argument: 0x000f0400 | Trn mode: 0x00000023 > [ 689.320487] mmc0: sdhci: Present: 0x01fd858f | Host ctl: 0x00000030 > [ 689.326925] mmc0: sdhci: Power: 0x00000002 | Blk gap: 0x00000080 > [ 689.333362] mmc0: sdhci: Wake-up: 0x00000008 | Clock: 0x0000000f > [ 689.339800] mmc0: sdhci: Timeout: 0x0000008f | Int stat: 0x00000000 > [ 689.346237] mmc0: sdhci: Int enab: 0x107f4000 | Sig enab: 0x107f4000 > [ 689.352674] mmc0: sdhci: AC12 err: 0x00000000 | Slot int: 0x00000502 > [ 689.359113] mmc0: sdhci: Caps: 0x07eb0000 | Caps_1: 0x8000b407 > [ 689.365549] mmc0: sdhci: Cmd: 0x00002c1a | Max curr: 0x00ffffff > [ 689.371987] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0xffffffff > [ 689.378424] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00d02700 > [ 689.384861] mmc0: sdhci: Host ctl2: 0x00000008 > [ 689.389302] mmc0: sdhci: ADMA Err: 0x00000009 | ADMA Ptr: 0x9009a400 > [ 689.395737] mmc0: sdhci: ============================================ > [ 689.402212] mmc0: running CQE recovery Tag 9 has been queued (bit set in Dev Pend) which means it is up to the eMMC to select it for execution. You should dump the times for the other mrq's to see how long they have been waiting and try to determine if anything is being processed. If the eMMC is just taking a really long time to process tasks we could extend the timeout, but it is hard to see how that is acceptable to a final product. At this point it looks like the eMMC may have a flaw in the way it selects tasks for execution. -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html