Re: [PATCH 4/4] md/r5cache: shift complex rmw from read path to write path

Shaohua Li <shli@xxxxxxxxxx> · Sat, 21 Jan 2017 10:06:41 -0800

On Thu, Jan 12, 2017 at 05:22:43PM -0800, Song Liu wrote:
> Write back cache requires a complex RMW mechanism, where old data is
> read into dev->orig_page for prexor, and then xor is done with
> dev->page. This logic is already implemented in the write path.
> 
> However, current read path is not awared of this requirement. When
> the array is optimal, the RMW is not required, as the data are
> read from raid disks. However, when the target stripe is degraded,
> complex RMW is required to generate right data.
> 
> To keep read path as clean as possible, we handle read path by
> flushing degraded, in-journal stripes before processing reads to
> missing dev.
> 
> Specifically, when there is read requests to a degraded stripe
> with data in journal, handle_stripe_fill() calls
> r5c_make_stripe_write_out() and exits. Then handle_stripe_dirtying()
> will do the complex RMW and flush the stripe to RAID disks. After
> that, read requests are handled.
> 
> There is one more corner case when there is non-overwrite bio for
> the missing (or out of sync) dev. handle_stripe_dirtying() will not
> be able to process the non-overwrite bios without constructing the
> data in handle_stripe_fill(). This is fixed by delaying non-overwrite
> bios in handle_stripe_dirtying(). So handle_stripe_fill() works on
> these bios after the stripe is flushed to raid disks.

This patch looks good and I think it should be applied to 4.10. Some minor
issues.

> Signed-off-by: Song Liu <songliubraving@xxxxxx>
> ---
>  drivers/md/raid5.c | 48 ++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 44 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index d07a319..193acd3 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -2938,6 +2938,30 @@ sector_t raid5_compute_blocknr(struct stripe_head *sh, int i, int previous)
>  	return r_sector;
>  }
>  
> +/*
> + * There are cases where we want handle_stripe_dirtying() and
> + * schedule_reconstruction() to delay towrite to some dev of a stripe.
> + *
> + * This function checks whether we want to delay the towrite. Specifically,
> + * we delay the towrite when:
> + *   1. degraded stripe has a non-overwrite to the missing dev, AND this
> + *      stripe has data in journal (for other devices).
> + *
> + *      In this case, when reading data for the non-overwrite dev, it is
> + *      necessary to handle complex rmw of write back cache (prexor with
> + *      orig_page, and xor with page). To keep read path simple, we would
> + *      like to flush data in journal to RAID disks first, so complex rmw
> + *      is handled in the write patch (handle_stripe_dirtying).
> + *
> + *   2. to be added

what does this mean?

> + */
> +static inline bool delay_towrite(struct r5dev *dev,
> +				   struct stripe_head_state *s)
> +{
> +	return dev->towrite && !test_bit(R5_OVERWRITE, &dev->flags) &&
> +		!test_bit(R5_Insync, &dev->flags) && s->injournal;
> +}

this is always called with dev->towrite true, so please remove it.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html