On 09/08/17 11:16, Adrian Hunter wrote: > On 09/08/17 10:57, Bough Chen wrote: >>> -----Original Message----- >>> From: linux-mmc-owner@xxxxxxxxxxxxxxx [mailto:linux-mmc- >>> owner@xxxxxxxxxxxxxxx] On Behalf Of Adrian Hunter >>> Sent: Wednesday, August 09, 2017 1:58 PM >>> To: Shawn Lin <shawn.lin@xxxxxxxxxxxxxx>; Bough Chen >>> <haibo.chen@xxxxxxx> >>> Cc: Ulf Hansson <ulf.hansson@xxxxxxxxxx>; linux-mmc <linux- >>> mmc@xxxxxxxxxxxxxxx>; Alex Lemberg <alex.lemberg@xxxxxxxxxxx>; Mateusz >>> Nowak <mateusz.nowak@xxxxxxxxx>; Yuliy Izrailov >>> <Yuliy.Izrailov@xxxxxxxxxxx>; Jaehoon Chung <jh80.chung@xxxxxxxxxxx>; >>> Dong Aisheng <dongas86@xxxxxxxxx>; Das Asutosh >>> <asutoshd@xxxxxxxxxxxxxx>; Zhangfei Gao <zhangfei.gao@xxxxxxxxx>; >>> Dorfman Konstantin <kdorfman@xxxxxxxxxxxxxx>; Sahitya Tummala >>> <stummala@xxxxxxxxxxxxxx>; Harjani Ritesh <riteshh@xxxxxxxxxxxxxx>; Venu >>> Byravarasu <vbyravarasu@xxxxxxxxxx>; Linus Walleij <linus.walleij@xxxxxxxxxx> >>> Subject: Re: [PATCH V4 09/11] mmc: block: Add CQE support >>> >>> On 09/08/17 03:55, Shawn Lin wrote: >>>> Hi, >>>> >>>> On 2017/8/8 20:07, Bough Chen wrote: >>>>>> -----Original Message----- >>>>>> From: Adrian Hunter [mailto:adrian.hunter@xxxxxxxxx] >>>>>> Sent: Friday, July 21, 2017 5:50 PM >>>>>> To: Ulf Hansson <ulf.hansson@xxxxxxxxxx> >>>>>> Cc: linux-mmc <linux-mmc@xxxxxxxxxxxxxxx>; Bough Chen >>>>>> <haibo.chen@xxxxxxx>; Alex Lemberg <alex.lemberg@xxxxxxxxxxx>; >>>>>> Mateusz Nowak <mateusz.nowak@xxxxxxxxx>; Yuliy Izrailov >>>>>> <Yuliy.Izrailov@xxxxxxxxxxx>; Jaehoon Chung >>>>>> <jh80.chung@xxxxxxxxxxx>; Dong Aisheng <dongas86@xxxxxxxxx>; Das >>>>>> Asutosh <asutoshd@xxxxxxxxxxxxxx>; Zhangfei Gao >>>>>> <zhangfei.gao@xxxxxxxxx>; Dorfman Konstantin >>>>>> <kdorfman@xxxxxxxxxxxxxx>; David Griego <david.griego@xxxxxxxxxx>; >>>>>> Sahitya Tummala <stummala@xxxxxxxxxxxxxx>; Harjani Ritesh >>>>>> <riteshh@xxxxxxxxxxxxxx>; Venu Byravarasu <vbyravarasu@xxxxxxxxxx>; >>>>>> Linus Walleij <linus.walleij@xxxxxxxxxx>; Shawn Lin >>>>>> <shawn.lin@xxxxxxxxxxxxxx> >>>>>> Subject: [PATCH V4 09/11] mmc: block: Add CQE support >>>>>> >>>>>> Add CQE support to the block driver, including: >>>>>> - optionally using DCMD for flush requests >>>>>> - manually issuing discard requests >>>>>> - issuing read / write requests to the CQE >>>>>> - supporting block-layer timeouts >>>>>> - handling recovery >>>>>> - supporting re-tuning >>>>>> >>>>>> Signed-off-by: Adrian Hunter <adrian.hunter@xxxxxxxxx> >>>>>> --- >>>>>> drivers/mmc/core/block.c | 195 >>> ++++++++++++++++++++++++++++++++- >>>>>> drivers/mmc/core/block.h | 7 ++ >>>>>> drivers/mmc/core/queue.c | 273 >>>>>> ++++++++++++++++++++++++++++++++++++++++++++++- >>>>>> drivers/mmc/core/queue.h | 42 +++++++- >>>>>> 4 files changed, 510 insertions(+), 7 deletions(-) >>>>>> >>>>>> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c >>>>>> index >>>>>> 915290c74363..2d25115637b7 100644 >>>>>> --- a/drivers/mmc/core/block.c >>>>>> +++ b/drivers/mmc/core/block.c >>>>>> @@ -109,6 +109,7 @@ struct mmc_blk_data { >>>>>> #define MMC_BLK_WRITE BIT(1) >>>>>> #define MMC_BLK_DISCARD BIT(2) >>>>>> #define MMC_BLK_SECDISCARD BIT(3) >>>>>> +#define MMC_BLK_CQE_RECOVERY BIT(4) >>>>>> >>>>>> /* >>>>>> * Only set in main mmc_blk_data associated @@ -1612,6 >>>>>> +1613,198 @@ static void mmc_blk_data_prep(struct mmc_queue *mq, >>>>>> struct mmc_queue_req *mqrq, >>>>>> *do_data_tag_p = do_data_tag; >>>>>> } >>>>>> >>>>>> +#define MMC_CQE_RETRIES 2 >>>> >>>> >>>>>> + blk_queue_rq_timed_out(mq->queue, mmc_cqe_timed_out); >>>>>> + blk_queue_rq_timeout(mq->queue, 60 * HZ); >>>>> >>>> >>>> ------8<------- >>>> >>>>> Hi Adrian, >>>>> >>>>> These days I'm doing CMDQ stress test, and find one issue. >>>>> On our i.MX8QXP-ARM2 board, the RAM is 3GB. eMMC is 32GB. >>>>> I use command 'free -m' get the total memory is 2800M, and the free >>>>> memory is 2500M. >>>>> >>>>> I use 'mkfs.ext4' to format ext4 file system on the eMMC under >>>>> HS400ES CMDQ mode, works fine. >>>>> >>>>> When I use the following command to stress test CMDQ, it works fine. >>>>> bonnie++ -d /run/media/mmcblk0p1/ -u 0:0 -s 2048 -r 1024 >>>>> >>>>> But when I change to use a large file size to do the same stress >>>>> test, using >>>>> bonnie++ -d /run/media/mmcblk0p1/ -u 0:0 -s 4096 -r 2048 >>>>> or >>>>> bonnie++ -d /run/media/mmcblk0p1/ -u 0:0 -s 5600 >>>>> >>>>> I get the following dump message. According to the log, >>>>> mmc_cqe_timed_out() was trigged. >>>>> Seems mmc was blocked in somewhere. >>>>> Then I try to debug this issue, and open MMC_DEBUG in config, do the >>>>> same test, print the detail Command sending information on the >>>>> console, but finally can't reproduce. >>> >>> mmc_cqe_timed_out() is a 60 second timeout provided by the block layer. >>> Refer "blk_queue_rq_timeout(mq->queue, 60 * HZ)" in mmc_init_queue(). >>> 60s is quite a long time so I would first want to determine if the task was really >>> queued that long. I would instrument some code into cqhci_request() to >>> record the start time on struct mmc_request, and then print the time taken >>> when there is a problem. >>> >> >> Hi Adrian, >> >> According to your suggestion, I add the following code to print the time. >> When issue happens, seems the request really pending for over 60s! >> >> done >> Writing intelligently...[ 689.209548] mmc0: cqhci: timeout for tag 9 >> [ 689.213658] the mrq all use 62123742 us >> [ 689.217487] mmc0: cqhci: ============ CQHCI REGISTER DUMP =========== >> [ 689.223927] mmc0: cqhci: Caps: 0x0000310a | Version: 0x00000510 >> [ 689.230363] mmc0: cqhci: Config: 0x00001001 | Control: 0x00000000 >> [ 689.236800] mmc0: cqhci: Int stat: 0x00000000 | Int enab: 0x00000006 >> [ 689.243238] mmc0: cqhci: Int sig: 0x00000006 | Int Coal: 0x00000000 >> [ 689.249675] mmc0: cqhci: TDL base: 0x90079000 | TDL up32: 0x00000000 >> [ 689.256113] mmc0: cqhci: Doorbell: 0x1fffffff | TCN: 0x00000000 >> [ 689.262550] mmc0: cqhci: Dev queue: 0x1fffefff | Dev Pend: 0x1fff7fff >> [ 689.268988] mmc0: cqhci: Task clr: 0x00000000 | SSC1: 0x00011000 >> [ 689.275425] mmc0: cqhci: SSC2: 0x00000001 | DCMD rsp: 0x00000800 >> [ 689.281862] mmc0: cqhci: RED mask: 0xfdf9a080 | TERRI: 0x00000000 >> [ 689.288300] mmc0: cqhci: Resp idx: 0x0000002f | Resp arg: 0x00000900 >> [ 689.294737] mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== >> [ 689.301176] mmc0: sdhci: Sys addr: 0xb602f000 | Version: 0x00000002 >> [ 689.307612] mmc0: sdhci: Blk size: 0x00000200 | Blk cnt: 0x00000400 >> [ 689.314050] mmc0: sdhci: Argument: 0x000f0400 | Trn mode: 0x00000023 >> [ 689.320487] mmc0: sdhci: Present: 0x01fd858f | Host ctl: 0x00000030 >> [ 689.326925] mmc0: sdhci: Power: 0x00000002 | Blk gap: 0x00000080 >> [ 689.333362] mmc0: sdhci: Wake-up: 0x00000008 | Clock: 0x0000000f >> [ 689.339800] mmc0: sdhci: Timeout: 0x0000008f | Int stat: 0x00000000 >> [ 689.346237] mmc0: sdhci: Int enab: 0x107f4000 | Sig enab: 0x107f4000 >> [ 689.352674] mmc0: sdhci: AC12 err: 0x00000000 | Slot int: 0x00000502 >> [ 689.359113] mmc0: sdhci: Caps: 0x07eb0000 | Caps_1: 0x8000b407 >> [ 689.365549] mmc0: sdhci: Cmd: 0x00002c1a | Max curr: 0x00ffffff >> [ 689.371987] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0xffffffff >> [ 689.378424] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00d02700 >> [ 689.384861] mmc0: sdhci: Host ctl2: 0x00000008 >> [ 689.389302] mmc0: sdhci: ADMA Err: 0x00000009 | ADMA Ptr: 0x9009a400 >> [ 689.395737] mmc0: sdhci: ============================================ >> [ 689.402212] mmc0: running CQE recovery > > Tag 9 has been queued (bit set in Dev Pend) which means it is up to the eMMC > to select it for execution. You should dump the times for the other mrq's > to see how long they have been waiting and try to determine if anything is > being processed. > > If the eMMC is just taking a really long time to process tasks we could > extend the timeout, but it is hard to see how that is acceptable to a final > product. At this point it looks like the eMMC may have a flaw in the way it > selects tasks for execution. No, that is wrong sorry, the task is in the QSR (Dev queue) so it is the CQE that has not selected it. -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html