Re: [PATCH] md/r5cache: flush data in memory during journal device failure

Shaohua Li <shli@xxxxxxxxxx> · Wed, 15 Mar 2017 15:48:31 -0700

On Tue, Mar 14, 2017 at 10:40:14PM +0000, Song Liu wrote:
> 
> > On Mar 14, 2017, at 10:50 AM, Shaohua Li <shli@xxxxxxxxxx> wrote:
> > 
> > On Mon, Mar 13, 2017 at 04:36:26PM -0700, Song Liu wrote:
> >> For the raid456 with writeback cache, when journal device failed during
> >> normal operation, it is still possible to persist all data, as all
> >> pending data is still in stripe cache. However, the stripe will be
> >> marked as fail with s.log_failed. Thus, the write out from stripe cache
> >> cannot make progress.
> >> 
> >> To unblock the write out in journal failures, this patch allows stripes
> >> with data injournal to make progress.
> > 
> > what about the parity part? if log failed, we should skip journaling the parity.
> > 
> > Thanks,
> > Shaohua
> > 
> 
> For stripes with data in journal (not flushed yet), the state machine 
> can flush them out. The behavior is just like when there are no journal 
> at all. 

can you explain this more? I didn't find any place we check the failure bit and
so skip journaling the parity. Also include the description in the changelog.

> On the other hand, other writes will be gated by the log_failed flags, 
> so the array appears to be read-only to upper layers. 
> 
> Thanks,
> Song
> 
> >> The array should be read-only in journal failures. Therefore, pending
> >> writes (in dev->towrite) are excluded in this write (in delay_towrite).
> >> 
> >> Signed-off-by: Song Liu <songliubraving@xxxxxx>
> >> ---
> >> drivers/md/raid5.c | 10 +++++++++-
> >> 1 file changed, 9 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> >> index 3233975..447d9dd 100644
> >> --- a/drivers/md/raid5.c
> >> +++ b/drivers/md/raid5.c
> >> @@ -3069,6 +3069,10 @@ sector_t raid5_compute_blocknr(struct stripe_head *sh, int i, int previous)
> >>  *      When LOG_CRITICAL, stripes with injournal == 0 will be sent to
> >>  *      no_space_stripes list.
> >>  *
> >> + *   3. during journal failure
> >> + *      In journal failure, we try to flush all cached data to raid disks
> >> + *      based on data in stripe cache. The array is read-only to upper
> >> + *      layers, so we would skip all pending writes.
> >>  */
> >> static inline bool delay_towrite(struct r5conf *conf,
> >> 				 struct r5dev *dev,
> >> @@ -3082,6 +3086,9 @@ static inline bool delay_towrite(struct r5conf *conf,
> >> 	if (test_bit(R5C_LOG_CRITICAL, &conf->cache_state) &&
> >> 	    s->injournal > 0)
> >> 		return true;
> >> +	/* case 3 above */
> >> +	if (s->log_failed && s->injournal)
> >> +		return true;
> >> 	return false;
> >> }
> >> 
> >> @@ -4721,7 +4728,8 @@ static void handle_stripe(struct stripe_head *sh)
> >> 	/* check if the array has lost more than max_degraded devices and,
> >> 	 * if so, some requests might need to be failed.
> >> 	 */
> >> -	if (s.failed > conf->max_degraded || s.log_failed) {
> >> +	if (s.failed > conf->max_degraded ||
> >> +	    (s.log_failed && s.injournal == 0)) {
> >> 		sh->check_state = 0;
> >> 		sh->reconstruct_state = 0;
> >> 		break_stripe_batch_list(sh, 0);
> >> -- 
> >> 2.9.3
> >> 
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html