On Thu, 2018-01-18 at 17:01 -0500, Mike Snitzer wrote: > And yet Laurence cannot reproduce any such lockups with your test... Hmm ... maybe I misunderstood Laurence but I don't think that Laurence has already succeeded at running an unmodified version of my tests. In one of the e-mails Laurence sent me this morning I read that he modified these scripts to get past a kernel module unload failure that was reported while starting these tests. So the next step is to check which changes were made to the test scripts and also whether the test results are still valid. > Are you absolutely certain this patch doesn't help you? > https://patchwork.kernel.org/patch/10174037/ > > If it doesn't then that is actually very useful to know. The first I tried this morning is to run the srp-test software against a merge of Jens' for-next branch and your dm-4.16 branch. Since I noticed that the dm queue locked up I reinserted a blk_mq_delay_run_hw_queue() call in the dm code. Since even that was not sufficient I tried to kick the queues via debugfs (for s in /sys/kernel/debug/block/*/state; do echo kick >$s; done). Since that was not sufficient to resolve the queue stall I reverted the following tree patches that are in Jens' tree: * "blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback" * "blk-mq-sched: remove unused 'can_block' arg from blk_mq_sched_insert_request" * "blk-mq: don't dispatch request in blk_mq_request_direct_issue if queue is busy" Only after I had done this the srp-test software ran again without triggering dm queue lockups. Sorry but I have not yet had the time to test patch "[RFC] blk-mq: fixup RESTART when queue becomes idle". > Please just focus on helping Laurence get his very capable testbed to > reproduce this issue. Once we can reproduce these "unkillable" "stalls" > in-house it'll be _much_ easier to analyze and fix. OK, I will work with Laurence on this. Maybe Laurence and I should work on this before analyzing the lockup that was mentioned above further? Bart.