Re: [PATCH 3/3] raid5-cache: IO error handling

Neil Brown <neilb@xxxxxxx> · Thu, 01 Oct 2015 14:50:20 +1000

Shaohua Li <shli@xxxxxx> writes:

> There are 3 places the raid5-cache dispatches IO. The discard IO error
> doesn't matter, so we ignore it. The superblock write IO error can be
> handled in MD core. The remaining are log write and flush. When the IO
> error happens, we simply fail all raid disks and continue the stripe
> state machine. The MD/raid5 core can handle it (for example, mark all
> disks faulty, report bio error and so on).
>
> Signed-off-by: Shaohua Li <shli@xxxxxx>
> ---
>  drivers/md/raid5-cache.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
> index afc3b6b..430ce5c 100644
> --- a/drivers/md/raid5-cache.c
> +++ b/drivers/md/raid5-cache.c
> @@ -223,7 +223,16 @@ static void __r5l_set_io_unit_state(struct r5l_io_unit *io,
>  	io->state = state;
>  }
>  
> -/* XXX: totally ignores I/O errors */
> +static void r5l_log_io_error(struct r5l_log *log)
> +{
> +	struct md_rdev *rdev;
> +
> +	rcu_read_lock();
> +	rdev_for_each_rcu(rdev, log->rdev->mddev)
> +		md_error(log->rdev->mddev, rdev);
> +	rcu_read_unlock();
> +}

This fails spare devices too... seems a bit heavy handed.

If the journal device fails we should still be able to read from the
array, just not write.

So can we just enhance the
	if (s.failed > conf->max_degraded) {
test in handle_stripe(), and probably improve has_failed() too??

Thanks,
NeilBrown

> +
>  static void r5l_log_endio(struct bio *bio)
>  {
>  	struct r5l_io_unit *io = bio->bi_private;
> @@ -232,6 +241,9 @@ static void r5l_log_endio(struct bio *bio)
>  
>  	bio_put(bio);
>  
> +	if (bio->bi_error)
> +		r5l_log_io_error(log);
> +
>  	if (!atomic_dec_and_test(&io->pending_io))
>  		return;
>  
> @@ -594,6 +606,9 @@ static void r5l_log_flush_endio(struct bio *bio)
>  	struct r5l_io_unit *io;
>  	struct stripe_head *sh;
>  
> +	if (bio->bi_error)
> +		r5l_log_io_error(log);
> +
>  	spin_lock_irqsave(&log->io_list_lock, flags);
>  	list_for_each_entry(io, &log->flushing_ios, log_sibling) {
>  		while (!list_empty(&io->stripe_list)) {
> @@ -681,6 +696,7 @@ static void r5l_write_super_and_discard_space(struct r5l_log *log,
>  			   !test_bit(MD_CHANGE_PENDING, &mddev->flags));
>  	}
>  
> +	/* discard IO error really doesn't matter, ignore it */
>  	if (log->last_checkpoint < end) {
>  		blkdev_issue_discard(bdev,
>  				log->last_checkpoint + log->rdev->data_offset,
> -- 
> 2.4.6
Attachment:
signature.asc

Description: PGP signature