Re: for-next branch and blktests/srp

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/6/18 1:56 PM, Bart Van Assche wrote:
> On Thu, 2018-12-06 at 08:47 -0800, Bart Van Assche wrote:
>> If I merge Jens' for-next branch with Linus' master branch, boot the
>> resulting kernel in a VM and run blktests/tests/srp/002 then that test
>> never finishes. The same test passes against Linus' master branch. I
>> think this is a regression. The following appears in the system log if
>> I run that test:
>>
>> Call Trace:
>> INFO: task kworker/0:1:12 blocked for more than 120 seconds.
>> Call Trace:
>> INFO: task ext4lazyinit:2079 blocked for more than 120 seconds.
>> Call Trace:
>> INFO: task fio:2151 blocked for more than 120 seconds.
>> Call Trace:
>> INFO: task fio:2154 blocked for more than 120 seconds.
> 
> Hi Jens,
> 
> My test results so far are as follows:
> * With kernel v4.20-rc5 test srp/002 passes.
> * With your for-next branch test srp/002 reports the symptoms reported in my e-mail.
> * With Linus' master branch from this morning test srp/002 fails in the same way as
>   your for-next branch.
> * Also with Linus' master branch, test srp/002 passes if I revert the following commit:
>   ffe81d45322c ("blk-mq: fix corruption with direct issue"). So it seems like that
>   commit fixed one regression but introduced another regression.

Yes, I'm on the same page, I've been able to reproduce. It seems to be related
to dm and bypass insert, which is somewhat odd. If I just do:

diff --git a/block/blk-core.c b/block/blk-core.c
index deb56932f8c4..4c44e6fa0d08 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2637,7 +2637,8 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request *
 		 * bypass a potential scheduler on the bottom device for
 		 * insert.
 		 */
-		return blk_mq_request_issue_directly(rq);
+		blk_mq_request_bypass_insert(rq, true);
+		return BLK_STS_OK;
 	}
 
 	spin_lock_irqsave(q->queue_lock, flags);

it works fine. Well, at least this regression is less serious, I'll bang
out a fix for it and ensure we make -rc6. I'm guessing it's the bypassin
of non-read/write, which your top of dispatch also shows to be a
non-read/write. But there should be no new failure case here that wasn't
possible before, only it's easier to hit now.

-- 
Jens Axboe




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux