Re: [PATCH] blk-mq: put driver tag when this request is completed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02.07.2020 11:23, Ming Lei wrote:
> On Thu, Jul 02, 2020 at 10:04:38AM +0200, Marek Szyprowski wrote:
>> On 02.07.2020 03:22, Ming Lei wrote:
>>> On Wed, Jul 01, 2020 at 04:16:32PM +0200, Marek Szyprowski wrote:
>>>> On 01.07.2020 15:45, Ming Lei wrote:
>>>>> On Wed, Jul 01, 2020 at 03:01:03PM +0200, Marek Szyprowski wrote:
>>>>>> On 29.06.2020 11:47, Ming Lei wrote:
>>>>>>> It is natural to release driver tag when this request is completed by
>>>>>>> LLD or device since its purpose is for LLD use.
>>>>>>>
>>>>>>> One big benefit is that the released tag can be re-used quicker since
>>>>>>> bio_endio() may take too long.
>>>>>>>
>>>>>>> Meantime we don't need to release driver tag for flush request.
>>>>>>>
>>>>>>> Cc: Christoph Hellwig <hch@xxxxxx>
>>>>>>> Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
>>>>>> This patch landed recently in linux-next as commit 36a3df5a4574. Sadly
>>>>>> it causes a regression on one of my test systems (ARM 32bit, Samsung
>>>>>> Exynos5422 SoC based Odroid XU3 board with eMMC). The system boots fine
>>>>>> and then after a few seconds every executed command hangs. No
>>>>>> panic/ops/any other message. I will try to provide more information asap
>>>>>> I find something to share. Simple reverting it in linux-next is not
>>>>>> possible due to dependencies.
>>>>> What is the exact eMMC's driver code(include the host driver)?
>>>> dwmmc-exynos (drivers/mmc/host/dw_mmc-exynos.c)
>>> Hi,
>>>
>>> Just take a quick look at mmc code, there are only two req->tag
>>> consumers:
>>>
>>> 1) cqhci_tag
>>> cqhci_tag
>>> 	cqhci_request
>>> 		host->cqe_ops->cqe_request
>>> 			mmc_cqe_start_req
>>> 	cqhci_timeout
>>>
>>> 2) mmc_hsq_request
>>> mmc_hsq_request
>>> 	host->cqe_ops->cqe_request
>>> 		mmc_cqe_start_req
>>>
>>> mmc_cqe_start_req() is called before issuing this request to hardware,
>>> so completion won't happen when the tag is used in mmc_cqe_start_req().
>>>
>>> cqhci_timeout() may race with normal completion, however looks the
>>> following code can handle the race correctly:
>>>
>>>           spin_lock_irqsave(&cq_host->lock, flags);
>>>           timed_out = slot->mrq == mrq;
>>>
>>> So still no idea why the commit causes the trouble for mmc.
>>>
>>> Do you know it is cqhci or mmc_hsh which works for dw_mmc-exynos?
>>> And can you apply the following patch and see if warning can be
>>> triggered?
>>>
>>> diff --git a/drivers/mmc/host/cqhci.c b/drivers/mmc/host/cqhci.c
>>> index 75934f3c117e..2cb49ecfbf34 100644
>>> --- a/drivers/mmc/host/cqhci.c
>>> +++ b/drivers/mmc/host/cqhci.c
>>> @@ -612,6 +612,7 @@ static int cqhci_request(struct mmc_host *mmc, struct mmc_request *mrq)
>>>    		goto out_unlock;
>>>    	}
>>>    
>>> +	WARN_ON_ONCE(cq_host->slot[tag].mrq);
>>>    	cq_host->slot[tag].mrq = mrq;
>>>    	cq_host->slot[tag].flags = 0;
>>>    
>>> diff --git a/drivers/mmc/host/mmc_hsq.c b/drivers/mmc/host/mmc_hsq.c
>>> index a5e05ed0fda3..11a4c1f3a970 100644
>>> --- a/drivers/mmc/host/mmc_hsq.c
>>> +++ b/drivers/mmc/host/mmc_hsq.c
>>> @@ -227,6 +227,7 @@ static int mmc_hsq_request(struct mmc_host *mmc, struct mmc_request *mrq)
>>>    		return -EBUSY;
>>>    	}
>>>    
>>> +	WARN_ON_ONCE(hsq->slot[tag].mrq);
>>>    	hsq->slot[tag].mrq = mrq;
>>>    
>>>    	/*
>> None of the above is even compiled for my system (I'm using
>> arm/exynos_defconfig), so this must be something else.
> Hello Marek,
>
> Or can you boot the system with one workable disk(usb, nand, ...)?
> then run some IO test on this eMMC, and collect debugfs log via the following
> command after the hang is triggered:
>
> (cd /sys/kernel/debug/block/$MMC && find . -type f -exec grep -aH . {} \;)
>
> $MMC is this mmc disk name.

I have no physical access to this system now, but I managed to run a 
screen session just after the boot to have more than one shell. Here is 
the dump when the other shell locks:

# (cd /sys/kernel/debug/block/mmcblk0 && find . -type f -exec grep -aH . 
{} \;)
./hctx0/cpu7/completed:372 0
./hctx0/cpu7/merged:0
./hctx0/cpu7/dispatched:372 0
./hctx0/cpu6/completed:385 0
./hctx0/cpu6/merged:0
./hctx0/cpu6/dispatched:385 0
./hctx0/cpu5/completed:525 0
./hctx0/cpu5/merged:0
./hctx0/cpu5/dispatched:526 0
./hctx0/cpu4/completed:707 0
./hctx0/cpu4/merged:0
./hctx0/cpu4/dispatched:708 0
./hctx0/cpu3/completed:97 1
./hctx0/cpu3/merged:0
./hctx0/cpu3/dispatched:97 2
./hctx0/cpu2/completed:180 0
./hctx0/cpu2/merged:0
./hctx0/cpu2/dispatched:181 0
./hctx0/cpu1/completed:257 0
./hctx0/cpu1/merged:0
./hctx0/cpu1/dispatched:257 2
./hctx0/cpu0/completed:45 0
./hctx0/cpu0/merged:0
./hctx0/cpu0/dispatched:45 0
./hctx0/type:default
./hctx0/dispatch_busy:0
./hctx0/active:0
./hctx0/run:2126
./hctx0/queued:2575
./hctx0/dispatched:       0     6
./hctx0/dispatched:       1     2050
./hctx0/dispatched:       2     150
./hctx0/dispatched:       4     11
./hctx0/dispatched:       8     1
./hctx0/dispatched:      16     1
./hctx0/dispatched:      32+    1
./hctx0/io_poll:considered=0
./hctx0/io_poll:invoked=0
./hctx0/io_poll:success=0
./hctx0/sched_tags_bitmap:00000000: 0020 0000 1000 0000 0000 7000 0800 0000
./hctx0/sched_tags:nr_tags=128
./hctx0/sched_tags:nr_reserved_tags=0
./hctx0/sched_tags:active_queues=0
./hctx0/sched_tags:bitmap_tags:
./hctx0/sched_tags:depth=128
./hctx0/sched_tags:busy=6
./hctx0/sched_tags:cleared=40
./hctx0/sched_tags:bits_per_word=32
./hctx0/sched_tags:map_nr=4
./hctx0/sched_tags:alloc_hint={126, 78, 98, 35, 12, 83, 11, 21}
./hctx0/sched_tags:wake_batch=8
./hctx0/sched_tags:wake_index=0
./hctx0/sched_tags:ws_active=0
./hctx0/sched_tags:ws={
./hctx0/sched_tags:     {.wait_cnt=8, .wait=inactive},
./hctx0/sched_tags:     {.wait_cnt=8, .wait=inactive},
./hctx0/sched_tags:     {.wait_cnt=8, .wait=inactive},
./hctx0/sched_tags:     {.wait_cnt=8, .wait=inactive},
./hctx0/sched_tags:     {.wait_cnt=8, .wait=inactive},
./hctx0/sched_tags:     {.wait_cnt=8, .wait=inactive},
./hctx0/sched_tags:     {.wait_cnt=8, .wait=inactive},
./hctx0/sched_tags:     {.wait_cnt=8, .wait=inactive},
./hctx0/sched_tags:}
./hctx0/sched_tags:round_robin=0
./hctx0/sched_tags:min_shallow_depth=4294967295
./hctx0/tags_bitmap:00000000: ffff ffff ffff ffff
./hctx0/tags:nr_tags=64
./hctx0/tags:nr_reserved_tags=0
./hctx0/tags:active_queues=0
./hctx0/tags:bitmap_tags:
./hctx0/tags:depth=64
./hctx0/tags:busy=64
./hctx0/tags:cleared=0
./hctx0/tags:bits_per_word=16
./hctx0/tags:map_nr=4
./hctx0/tags:alloc_hint={14, 0, 0, 0, 0, 0, 13, 22}
./hctx0/tags:wake_batch=8
./hctx0/tags:wake_index=0
./hctx0/tags:ws_active=0
./hctx0/tags:ws={
./hctx0/tags:   {.wait_cnt=8, .wait=inactive},
./hctx0/tags:   {.wait_cnt=8, .wait=inactive},
./hctx0/tags:   {.wait_cnt=8, .wait=inactive},
./hctx0/tags:   {.wait_cnt=8, .wait=inactive},
./hctx0/tags:   {.wait_cnt=8, .wait=inactive},
./hctx0/tags:   {.wait_cnt=8, .wait=inactive},
./hctx0/tags:   {.wait_cnt=8, .wait=inactive},
./hctx0/tags:   {.wait_cnt=8, .wait=inactive},
./hctx0/tags:}
./hctx0/tags:round_robin=0
./hctx0/tags:min_shallow_depth=4294967295
./hctx0/ctx_map:00000000: 00
./hctx0/dispatch:35e4bf48 {.op=FLUSH, .cmd_flags=PREFLUSH, 
.rq_flags=FLUSH_SEQ, .state=idle, .tag=-1, .internal_tag=84}
./hctx0/flags:alloc_policy=FIFO SHOULD_MERGE|BLOCKING
./hctx0/state:SCHED_RESTART
./sched/starved:0
./sched/batching:1
./sched/write_fifo_list:ab51b2cc {.op=WRITE, 
.cmd_flags=META|PRIO|BACKGROUND, .rq_flags=ELVPRIV|IO_STAT|HASHED, 
.state=idle, .tag=-1, .internal_tag=85}
./sched/write_fifo_list:ac854e48 {.op=WRITE, 
.cmd_flags=META|PRIO|BACKGROUND, .rq_flags=ELVPRIV|IO_STAT|HASHED, 
.state=idle, .tag=-1, .internal_tag=86}
./sched/write_fifo_list:b12e714a {.op=WRITE, .cmd_flags=, 
.rq_flags=ELVPRIV|IO_STAT|HASHED, .state=idle, .tag=-1, .internal_tag=36}
./sched/write_fifo_list:99383a87 {.op=WRITE, .cmd_flags=SYNC, 
.rq_flags=ELVPRIV|IO_STAT|HASHED, .state=idle, .tag=-1, .internal_tag=99}
./sched/read_fifo_list:29365a9b {.op=READ, 
.cmd_flags=FAILFAST_DEV|FAILFAST_TRAag=-1, .internal_tag=13}AHEAD, 
.rq_flags=ELVPRIV|IO_STAT|HASHED, .state=idle, .ta
./write_hints:hint0: 0
./write_hints:hint1: 0
./write_hints:hint2: 0
./write_hints:hint3: 0
./write_hints:hint4: 0
./state:SAME_COMP|NONROT|IO_STAT|DISCARD|SECERASE|INIT_DONE|WC|FUA|REGISTERED
./pm_only:0
./poll_stat:read  (512 Bytes): samples=0
./poll_stat:write (512 Bytes): samples=0
./poll_stat:read  (1024 Bytes): samples=0
./poll_stat:write (1024 Bytes): samples=0
./poll_stat:read  (2048 Bytes): samples=0
./poll_stat:write (2048 Bytes): samples=0
./poll_stat:read  (4096 Bytes): samples=0
./poll_stat:write (4096 Bytes): samples=0
./poll_stat:read  (8192 Bytes): samples=0
./poll_stat:write (8192 Bytes): samples=0
./poll_stat:read  (16384 Bytes): samples=0
./poll_stat:write (16384 Bytes): samples=0
./poll_stat:read  (32768 Bytes): samples=0
./poll_stat:write (32768 Bytes): samples=0
./poll_stat:read  (65536 Bytes): samples=0
./poll_stat:write (65536 Bytes): samples=0
root@target:~#

I hope it helps.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux