On Thu, Jan 12, 2017 at 05:22:43PM -0800, Song Liu wrote: > Write back cache requires a complex RMW mechanism, where old data is > read into dev->orig_page for prexor, and then xor is done with > dev->page. This logic is already implemented in the write path. > > However, current read path is not awared of this requirement. When > the array is optimal, the RMW is not required, as the data are > read from raid disks. However, when the target stripe is degraded, > complex RMW is required to generate right data. > > To keep read path as clean as possible, we handle read path by > flushing degraded, in-journal stripes before processing reads to > missing dev. > > Specifically, when there is read requests to a degraded stripe > with data in journal, handle_stripe_fill() calls > r5c_make_stripe_write_out() and exits. Then handle_stripe_dirtying() > will do the complex RMW and flush the stripe to RAID disks. After > that, read requests are handled. > > There is one more corner case when there is non-overwrite bio for > the missing (or out of sync) dev. handle_stripe_dirtying() will not > be able to process the non-overwrite bios without constructing the > data in handle_stripe_fill(). This is fixed by delaying non-overwrite > bios in handle_stripe_dirtying(). So handle_stripe_fill() works on > these bios after the stripe is flushed to raid disks. This patch looks good and I think it should be applied to 4.10. Some minor issues. > Signed-off-by: Song Liu <songliubraving@xxxxxx> > --- > drivers/md/raid5.c | 48 ++++++++++++++++++++++++++++++++++++++++++++---- > 1 file changed, 44 insertions(+), 4 deletions(-) > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index d07a319..193acd3 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -2938,6 +2938,30 @@ sector_t raid5_compute_blocknr(struct stripe_head *sh, int i, int previous) > return r_sector; > } > > +/* > + * There are cases where we want handle_stripe_dirtying() and > + * schedule_reconstruction() to delay towrite to some dev of a stripe. > + * > + * This function checks whether we want to delay the towrite. Specifically, > + * we delay the towrite when: > + * 1. degraded stripe has a non-overwrite to the missing dev, AND this > + * stripe has data in journal (for other devices). > + * > + * In this case, when reading data for the non-overwrite dev, it is > + * necessary to handle complex rmw of write back cache (prexor with > + * orig_page, and xor with page). To keep read path simple, we would > + * like to flush data in journal to RAID disks first, so complex rmw > + * is handled in the write patch (handle_stripe_dirtying). > + * > + * 2. to be added what does this mean? > + */ > +static inline bool delay_towrite(struct r5dev *dev, > + struct stripe_head_state *s) > +{ > + return dev->towrite && !test_bit(R5_OVERWRITE, &dev->flags) && > + !test_bit(R5_Insync, &dev->flags) && s->injournal; > +} this is always called with dev->towrite true, so please remove it. Thanks, Shaohua -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html