Re: v4.8 dm-mpath

Mike Snitzer <snitzer@xxxxxxxxxx> · Wed, 17 Aug 2016 21:54:34 -0400

On Wed, Aug 17 2016 at  8:29pm -0400,
Bart Van Assche <bart.vanassche@xxxxxxxxxxx> wrote:

> On 08/16/2016 07:43 PM, Mike Snitzer wrote:
> >On Tue, Aug 16 2016 at  3:12pm -0400, Mike Snitzer <snitzer@xxxxxxxxxx> wrote:
> >>On Tue, Aug 16 2016 at  1:32pm -0400, Bart Van Assche <bart.vanassche@xxxxxxxxxxx> wrote:
> >>>Can you have a look at this?
> >>
> >>I'll get linux-dm.git's 'dm-4.8' branch (v4.8-rc2) loaded on one of my
> >>testbed systems to run the mptest testsuite.  It should provide coverage
> >>for simple failover and failback.
> >
> >I successfully ran mptest against that v4.8-rc2 based kernel.  Any
> >chance you could try mptest to see if it also passes for you?  Or if it
> >is able to trigger the issue for you.
> >
> >mptest's runtest script defaults to setting up scsi-mq and dm-mq.
> 
> Hello Mike,
> 
> I will run mptest as soon as I have the time.

OK, thanks.

> Earlier today I came up with three patches that avoid the hang in
> truncate_inode_pages_range() that I had reported before. It would be
> appreciated if you could have a look at these patches. Please keep
> in mind that I'm not a dm expert.

I don't recall where you mentioned a hang in
truncate_inode_pages_range().. ah I see it here:
https://patchwork.kernel.org/patch/9202331/

Comments inlined below.

> From 0e8e9f7d7489f5e3b0bf9e0c59257277d08a2ec0 Mon Sep 17 00:00:00 2001
> From: Bart Van Assche <bart.vanassche@xxxxxxxxxxx>
> Date: Wed, 17 Aug 2016 14:18:36 -0700
> Subject: [PATCH 1/3] block: Avoid that requests get stuck when stopping and
>  restarting a queue
> 
> If a driver calls blk_start_queue() after a queue has been marked
> "dead", avoid that requests get stuck by invoking __blk_run_queue()
> directly if QUEUE_FLAG_DEAD has been set.
> 
> While testing the dm-mpath driver I noticed that sporadically a
> readahead request was generated by the filesystem on top of the dm
> device but that that I/O request was never processed. This happened
> only while the dm core was suspending and resuming I/O.s
> 
> The following backtrace (triggered by a WARN_ON_ONCE(true) statement
> that I had inserted just before the __blk_run_queue(q) call) shows
> that this code path is relevant:
> 
> WARNING: CPU: 7 PID: 4707 at drivers/md/dm.c:2035 map_request+0x1d5/0x340 [dm_mod]
> Aug 17 15:47:35 ion-dev-ib-ini kernel: CPU: 7 PID: 4707 Comm: kdmwork-254:0 Tainted: G        W       4.7.0-dbg+ #2
> Call Trace:
>  [<ffffffff81322ee7>] dump_stack+0x68/0xa1
>  [<ffffffff81061e06>] __warn+0xc6/0xe0
>  [<ffffffff81061ed8>] warn_slowpath_null+0x18/0x20
>  [<ffffffffa04282d5>] map_request+0x1d5/0x340 [dm_mod]
>  [<ffffffffa042845d>] map_tio_request+0x1d/0x40 [dm_mod]
>  [<ffffffff81085fe1>] kthread_worker_fn+0xd1/0x1b0
>  [<ffffffff81085e9a>] kthread+0xea/0x100
>  [<ffffffff8162857f>] ret_from_fork+0x1f/0x40
> 
> References: commit 704605711ef0 ("block: Avoid scheduling delayed work on a dead queue")
> Signed-off-by: Bart Van Assche <bart.vanassche@xxxxxxxxxxx>
> Cc: Christoph Hellwig <hch@xxxxxx>
> Cc: Mike Snitzer <snitzer@xxxxxxxxxx>
> Cc: stable <stable@xxxxxxxxxxxxxxx>
> ---
>  block/blk-core.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 0d9638e..5f75709 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -355,6 +355,8 @@ void blk_run_queue_async(struct request_queue *q)
>  {
>  	if (likely(!blk_queue_stopped(q) && !blk_queue_dead(q)))
>  		mod_delayed_work(kblockd_workqueue, &q->delay_work, 0);
> +	else
> +		__blk_run_queue(q);
>  }
>  EXPORT_SYMBOL(blk_run_queue_async);

Your else clause would catch the case when the queue is stopped but
not dead... __blk_run_queue() will handle this cleanly but I think the
logic could be: else if (blk_queue_dead(q))

Also, strikes me as odd to have an _async interface resort to a
synchronous call.  Why is that needed?  Are you sure it is safe for all
callers?

> From 9b103cfb2d9b4cdec73c7d09188b4d42b041d49e Mon Sep 17 00:00:00 2001
> From: Bart Van Assche <bart.vanassche@xxxxxxxxxxx>
> Date: Wed, 17 Aug 2016 16:26:02 -0700
> Subject: [PATCH 2/3] dm: Change return type of __dm_internal_suspend() to int
> 
> ---
>  drivers/md/dm.c | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 3689b7f..6eccdf3 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -3379,16 +3379,18 @@ out:
>   * It may be used only from the kernel.
>   */
>  
> -static void __dm_internal_suspend(struct mapped_device *md, unsigned suspend_flags)
> +static int __dm_internal_suspend(struct mapped_device *md,
> +				 unsigned suspend_flags)
>  {
>  	struct dm_table *map = NULL;
> +	int ret = 0;
>  
>  	if (md->internal_suspend_count++)
> -		return; /* nested internal suspend */
> +		goto out; /* nested internal suspend */
>  
>  	if (dm_suspended_md(md)) {
>  		set_bit(DMF_SUSPENDED_INTERNALLY, &md->flags);
> -		return; /* nest suspend */
> +		goto out; /* nest suspend */
>  	}
>  
>  	map = rcu_dereference_protected(md->map, lockdep_is_held(&md->suspend_lock));
> @@ -3399,10 +3401,13 @@ static void __dm_internal_suspend(struct mapped_device *md, unsigned suspend_fla
>  	 * would require changing .presuspend to return an error -- avoid this
>  	 * until there is a need for more elaborate variants of internal suspend.
>  	 */
> -	(void) __dm_suspend(md, map, suspend_flags, TASK_UNINTERRUPTIBLE,
> -			    DMF_SUSPENDED_INTERNALLY);
> +	ret = __dm_suspend(md, map, suspend_flags, TASK_UNINTERRUPTIBLE,
> +			   DMF_SUSPENDED_INTERNALLY);
>  
>  	dm_table_postsuspend_targets(map);
> +
> +out:
> +	return ret;
>  }
>  
>  static void __dm_internal_resume(struct mapped_device *md)

So you're only wanting to have your new __dm_internal_suspend() caller,
__dm_destroy, issue a WARN_ON if __dm_suspend() fails.

Is this _really_ important or did you just do this for debugging
purposes?

> From f6e58e71eec584b3f66fe8367340ca896d29408e Mon Sep 17 00:00:00 2001
> From: Bart Van Assche <bart.vanassche@xxxxxxxxxxx>
> Date: Wed, 17 Aug 2016 15:26:07 -0700
> Subject: [PATCH 3/3] dm: Avoid that requests get stuck while destroying a dm
>  device
> 
> ---
>  drivers/md/dm.c | 25 ++++++++++++++-----------
>  1 file changed, 14 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 6eccdf3..ae4bfa0 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -2958,10 +2958,13 @@ const char *dm_device_name(struct mapped_device *md)
>  }
>  EXPORT_SYMBOL_GPL(dm_device_name);
>  
> +static int __dm_internal_suspend(struct mapped_device *md,
> +				 unsigned suspend_flags);
> +
>  static void __dm_destroy(struct mapped_device *md, bool wait)
>  {
> -	struct dm_table *map;
> -	int srcu_idx;
> +	struct request_queue *q = dm_get_md_queue(md);
> +	int res;
>  
>  	might_sleep();
>  
> @@ -2970,21 +2973,21 @@ static void __dm_destroy(struct mapped_device *md, bool wait)
>  	set_bit(DMF_FREEING, &md->flags);
>  	spin_unlock(&_minor_lock);
>  
> -	if (dm_request_based(md) && md->kworker_task)
> -		flush_kthread_worker(&md->kworker);

Header needs detail.  E.g. __dm_suspend() via, __dm_internal_suspend(),
will call flush_kthread_worker().  But that is a nit in the grand scheme
of things.  Bigger concerns below.

> +	/*
> +	 * Avoid that dm_make_request() gets called after
> +	 * __dm_internal_suspend() finished.
> +	 */
> +	spin_lock_irq(q->queue_lock);
> +	queue_flag_set(QUEUE_FLAG_DYING, q);
> +	spin_unlock_irq(q->queue_lock);

Feels like the case for marking the queue as DYING can be made
independent of changes to the mechanism for flushing IO here.

>  
>  	/*
>  	 * Take suspend_lock so that presuspend and postsuspend methods
>  	 * do not race with internal suspend.
>  	 */
>  	mutex_lock(&md->suspend_lock);
> -	map = dm_get_live_table(md, &srcu_idx);
> -	if (!dm_suspended_md(md)) {
> -		dm_table_presuspend_targets(map);
> -		dm_table_postsuspend_targets(map);
> -	}
> -	/* dm_put_live_table must be before msleep, otherwise deadlock is possible */
> -	dm_put_live_table(md, srcu_idx);
> +	res = __dm_internal_suspend(md, DM_SUSPEND_NOFLUSH_FLAG);
> +	WARN_ONCE(res, "__dm_internal_suspend() returned %d\n", res);
>  	mutex_unlock(&md->suspend_lock);
>  
>  	/*

Please be explicit about why switching to using __dm_internal_suspend()
is important.  Are you just going for code reuse or is there a more
important reason?

I can infer you probably want the call to dm_stop_queue()...

But I need way more convincing on why __dm_internal_suspend() is needed.
As you can see from the header for commit ffcc3936416, the internal
suspend code is relatively complex.  Increasing its use, especially in a
method such as __dm_destroy, needs a good reason.

But that aside, what raises serious concerns for me is that you're using
DM_SUSPEND_NOFLUSH_FLAG where we absolutely _need_ to flush IO (given
we're in __dm_destroy).

At a minimum DM_SUSPEND_NOFLUSH_FLAG must not be used.

Mike

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel