Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues

Bart Van Assche <bart.vanassche@xxxxxxxxxxx> · Thu, 1 Sep 2016 13:15:17 -0700

On 09/01/2016 12:05 PM, Mike Snitzer wrote:
On Thu, Sep 01 2016 at  1:59pm -0400,
Bart Van Assche <bart.vanassche@xxxxxxxxxxx> wrote:
On 09/01/2016 09:12 AM, Mike Snitzer wrote:
Please see/test the dm-4.8 and dm-4.9 branches (dm-4.9 being rebased
ontop of dm-4.8):
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.8
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.9

Hello Mike,

The result of my tests of the dm-4.9 branch is as follows:
* With patch "dm mpath: check if path's request_queue is dying in
activate_path()" I still see every now and then that CPU usage of
one of the kworker threads jumps to 100%.

So you're saying that the dying queue check is still needed in the path
selector?  Would be useful to know why the 100% is occuring.  Can you
get a stack trace during this time?

Hello Mike,

A few days ago I had already tried to obtain a stack trace with perf but 
the information reported by perf wasn't entirely accurate. What I know 
about that 100% CPU usage is as follows:
* "dmsetup table" showed three SRP SCSI device nodes but these SRP SCSI
  device nodes were not visible in /sys/block. This means that
  scsi_remove_host() had already removed these from sysfs.
* hctx->run_work kept being requeued over and over again on the kernel
  thread with name "kworker/3:1H". I assume this means that
  blk_mq_run_hw_queue() was called with the second argument (async) set
  to true. This probably means that the following dm-rq code was
  triggered:

	if (map_request(tio, rq, md) == DM_MAPIO_REQUEUE) {
		/* Undo dm_start_request() before requeuing */
		rq_end_stats(md, rq);
		rq_completed(md, rq_data_dir(rq), false);
		return BLK_MQ_RQ_QUEUE_BUSY;
	}

Bart.

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel