Re: [PATCH] md: Call blk_queue_flush() to establish flush/fua support

Neil Brown <neilb@xxxxxxx> · Tue, 23 Nov 2010 10:50:00 +1100

On Mon, 22 Nov 2010 15:22:08 -0800
"Darrick J. Wong" <djwong@xxxxxxxxxx> wrote:

> Before 2.6.37, the md layer had a mechanism for catching I/Os with the barrier
> flag set, and translating the barrier into barriers for all the underlying
> devices.  With 2.6.37, I/O barriers have become plain old flushes, and the md
> code was updated to reflect this.  However, one piece was left out -- the md
> layer does not tell the block layer that it supports flushes or FUA access at
> all, which results in md silently dropping flush requests.
> 
> Since the support already seems there, just add this one piece of bookkeeping
> to restore the ability to flush writes through md.

I would rather just unconditionally call
   blk_queue_flush(mddev->queue, REQ_FLUSH | REQ_FUA);

I don't think there is much to be gained by trying to track exactly what the
underlying devices support, and as the devices can change, that is racy
anyway.

Thoughts?

NeilBrown

> 
> Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> ---
> 
>  drivers/md/md.c |   25 ++++++++++++++++++++++++-
>  1 files changed, 24 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 324a366..a52d7be 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -356,6 +356,21 @@ EXPORT_SYMBOL(mddev_congested);
>  /*
>   * Generic flush handling for md
>   */
> +static void evaluate_flush_capability(mddev_t *mddev)
> +{
> +	mdk_rdev_t *rdev;
> +	unsigned int flush = REQ_FLUSH | REQ_FUA;
> +
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(rdev, &mddev->disks, same_set) {
> +		if (rdev->raid_disk < 0)
> +			continue;
> +		flush &= rdev->bdev->bd_disk->queue->flush_flags;
> +	}
> +	rcu_read_unlock();
> +
> +	blk_queue_flush(mddev->queue, flush);
> +}
>  
>  static void md_end_flush(struct bio *bio, int err)
>  {
> @@ -1885,6 +1900,8 @@ static int bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev)
>  	/* May as well allow recovery to be retried once */
>  	mddev->recovery_disabled = 0;
>  
> +	evaluate_flush_capability(mddev);
> +
>  	return 0;
>  
>   fail:
> @@ -1903,17 +1920,23 @@ static void md_delayed_delete(struct work_struct *ws)
>  static void unbind_rdev_from_array(mdk_rdev_t * rdev)
>  {
>  	char b[BDEVNAME_SIZE];
> +	mddev_t *mddev;
> +
>  	if (!rdev->mddev) {
>  		MD_BUG();
>  		return;
>  	}
> -	bd_release_from_disk(rdev->bdev, rdev->mddev->gendisk);
> +	mddev = rdev->mddev;
> +	bd_release_from_disk(rdev->bdev, mddev->gendisk);
>  	list_del_rcu(&rdev->same_set);
>  	printk(KERN_INFO "md: unbind<%s>\n", bdevname(rdev->bdev,b));
>  	rdev->mddev = NULL;
>  	sysfs_remove_link(&rdev->kobj, "block");
>  	sysfs_put(rdev->sysfs_state);
>  	rdev->sysfs_state = NULL;
> +
> +	evaluate_flush_capability(mddev);
> +
>  	/* We need to delay this, otherwise we can deadlock when
>  	 * writing to 'remove' to "dev/state".  We also need
>  	 * to delay it due to rcu usage.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html