> On May 10, 2017, at 10:01 AM, Shaohua Li <shli@xxxxxxxxxx> wrote: > > On Mon, May 08, 2017 at 05:39:25PM -0700, Song Liu wrote: >> For the raid456 with writeback cache, when journal device failed during >> normal operation, it is still possible to persist all data, as all >> pending data is still in stripe cache. However, it is necessary to handle >> journal failure gracefully. >> >> During journal failures, this patch makes the follow changes to land data >> in cache to raid disks gracefully: >> >> 1. In handle_stripe(), allow stripes with data in journal (s.injournal > 0) >> to make progress; >> 2. In delay_towrite(), only process data in the cache (skip dev->towrite); >> 3. In __get_priority_stripe(), set try_loprio to true, so no stripe stuck >> in loprio_list > > Applied the first patch. For this patch, I don't have a clear picture about > what you are trying to do. Please describe the steps we are doing to do after > journal failure. I will add more description to the next version. > >> Signed-off-by: Song Liu <songliubraving@xxxxxx> >> --- >> drivers/md/raid5-cache.c | 13 ++++++++++--- >> drivers/md/raid5-log.h | 3 ++- >> drivers/md/raid5.c | 29 +++++++++++++++++++++++------ >> 3 files changed, 35 insertions(+), 10 deletions(-) >> >> diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c >> index dc1dba6..e6032f6 100644 >> --- a/drivers/md/raid5-cache.c >> +++ b/drivers/md/raid5-cache.c >> @@ -24,6 +24,7 @@ >> #include "md.h" >> #include "raid5.h" >> #include "bitmap.h" >> +#include "raid5-log.h" >> >> /* >> * metadata/data stored in disk with 4k size unit (a block) regardless >> @@ -679,6 +680,7 @@ static void r5c_disable_writeback_async(struct work_struct *work) >> return; >> pr_info("md/raid:%s: Disabling writeback cache for degraded array.\n", >> mdname(mddev)); >> + md_update_sb(mddev, 1); > > Why this? And md_update_sb must be called within mddev->reconfig_mutex locked. This is to avoid skipping in handle_stripe(): if (s.handle_bad_blocks || test_bit(MD_SB_CHANGE_PENDING, &conf->mddev->sb_flags)) { set_bit(STRIPE_HANDLE, &sh->state); goto finish; } I haven't got a better idea than calling md_update_sb() somewhere. It is also tricky to lock mddev->reconfigured_mutex here, due to potential deadlocking with mddev->open_mutex. Do you have suggestions on this? >> mddev_suspend(mddev); >> log->r5c_journal_mode = R5C_JOURNAL_MODE_WRITE_THROUGH; >> mddev_resume(mddev); >> @@ -1557,6 +1559,8 @@ void r5l_wake_reclaim(struct r5l_log *log, sector_t space) >> void r5l_quiesce(struct r5l_log *log, int state) >> { >> struct mddev *mddev; >> + struct r5conf *conf; >> + >> if (!log || state == 2) >> return; >> if (state == 0) >> @@ -1564,10 +1568,12 @@ void r5l_quiesce(struct r5l_log *log, int state) >> else if (state == 1) { >> /* make sure r5l_write_super_and_discard_space exits */ >> mddev = log->rdev->mddev; >> + conf = mddev->private; >> wake_up(&mddev->sb_wait); >> kthread_park(log->reclaim_thread->tsk); >> r5l_wake_reclaim(log, MaxSector); >> - r5l_do_reclaim(log); >> + if (!r5l_log_disk_error(conf)) >> + r5l_do_reclaim(log); > > I think r5c_disable_writeback_async() will call into this, so we flush all > stripe cache out to raid disks, why skip the reclaim? > r5l_do_reclaim() reclaims log space with 2 steps: 1. clear all io_unit lists (flushing_ios, etc.) by waking up mddev->thread. 2. update log_tail in the journal device, and issue discard to journal device. When we are handling log failures in r5c_disable_writeback_async(), we are flushing the cache, so it is not necessary to wake up mddev->thread. Also, with log device error, it is not necessary update log_tail or issue discard. Therefore, r5l_do_reclaim is not necessary in log disk errors. Thanks, Song -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html