On Mon, 22 Nov 2010 15:22:08 -0800 "Darrick J. Wong" <djwong@xxxxxxxxxx> wrote: > Before 2.6.37, the md layer had a mechanism for catching I/Os with the barrier > flag set, and translating the barrier into barriers for all the underlying > devices. With 2.6.37, I/O barriers have become plain old flushes, and the md > code was updated to reflect this. However, one piece was left out -- the md > layer does not tell the block layer that it supports flushes or FUA access at > all, which results in md silently dropping flush requests. > > Since the support already seems there, just add this one piece of bookkeeping > to restore the ability to flush writes through md. I would rather just unconditionally call blk_queue_flush(mddev->queue, REQ_FLUSH | REQ_FUA); I don't think there is much to be gained by trying to track exactly what the underlying devices support, and as the devices can change, that is racy anyway. Thoughts? NeilBrown > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > --- > > drivers/md/md.c | 25 ++++++++++++++++++++++++- > 1 files changed, 24 insertions(+), 1 deletions(-) > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 324a366..a52d7be 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -356,6 +356,21 @@ EXPORT_SYMBOL(mddev_congested); > /* > * Generic flush handling for md > */ > +static void evaluate_flush_capability(mddev_t *mddev) > +{ > + mdk_rdev_t *rdev; > + unsigned int flush = REQ_FLUSH | REQ_FUA; > + > + rcu_read_lock(); > + list_for_each_entry_rcu(rdev, &mddev->disks, same_set) { > + if (rdev->raid_disk < 0) > + continue; > + flush &= rdev->bdev->bd_disk->queue->flush_flags; > + } > + rcu_read_unlock(); > + > + blk_queue_flush(mddev->queue, flush); > +} > > static void md_end_flush(struct bio *bio, int err) > { > @@ -1885,6 +1900,8 @@ static int bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev) > /* May as well allow recovery to be retried once */ > mddev->recovery_disabled = 0; > > + evaluate_flush_capability(mddev); > + > return 0; > > fail: > @@ -1903,17 +1920,23 @@ static void md_delayed_delete(struct work_struct *ws) > static void unbind_rdev_from_array(mdk_rdev_t * rdev) > { > char b[BDEVNAME_SIZE]; > + mddev_t *mddev; > + > if (!rdev->mddev) { > MD_BUG(); > return; > } > - bd_release_from_disk(rdev->bdev, rdev->mddev->gendisk); > + mddev = rdev->mddev; > + bd_release_from_disk(rdev->bdev, mddev->gendisk); > list_del_rcu(&rdev->same_set); > printk(KERN_INFO "md: unbind<%s>\n", bdevname(rdev->bdev,b)); > rdev->mddev = NULL; > sysfs_remove_link(&rdev->kobj, "block"); > sysfs_put(rdev->sysfs_state); > rdev->sysfs_state = NULL; > + > + evaluate_flush_capability(mddev); > + > /* We need to delay this, otherwise we can deadlock when > * writing to 'remove' to "dev/state". We also need > * to delay it due to rcu usage. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html