On Wed, May 10, 2017 at 09:45:05PM +0000, Song Liu wrote: > > > On May 10, 2017, at 10:01 AM, Shaohua Li <shli@xxxxxxxxxx> wrote: > > > > On Mon, May 08, 2017 at 05:39:25PM -0700, Song Liu wrote: > >> For the raid456 with writeback cache, when journal device failed during > >> normal operation, it is still possible to persist all data, as all > >> pending data is still in stripe cache. However, it is necessary to handle > >> journal failure gracefully. > >> > >> During journal failures, this patch makes the follow changes to land data > >> in cache to raid disks gracefully: > >> > >> 1. In handle_stripe(), allow stripes with data in journal (s.injournal > 0) > >> to make progress; > >> 2. In delay_towrite(), only process data in the cache (skip dev->towrite); > >> 3. In __get_priority_stripe(), set try_loprio to true, so no stripe stuck > >> in loprio_list > > > > Applied the first patch. For this patch, I don't have a clear picture about > > what you are trying to do. Please describe the steps we are doing to do after > > journal failure. > > I will add more description to the next version. > > > > >> Signed-off-by: Song Liu <songliubraving@xxxxxx> > >> --- > >> drivers/md/raid5-cache.c | 13 ++++++++++--- > >> drivers/md/raid5-log.h | 3 ++- > >> drivers/md/raid5.c | 29 +++++++++++++++++++++++------ > >> 3 files changed, 35 insertions(+), 10 deletions(-) > >> > >> diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c > >> index dc1dba6..e6032f6 100644 > >> --- a/drivers/md/raid5-cache.c > >> +++ b/drivers/md/raid5-cache.c > >> @@ -24,6 +24,7 @@ > >> #include "md.h" > >> #include "raid5.h" > >> #include "bitmap.h" > >> +#include "raid5-log.h" > >> > >> /* > >> * metadata/data stored in disk with 4k size unit (a block) regardless > >> @@ -679,6 +680,7 @@ static void r5c_disable_writeback_async(struct work_struct *work) > >> return; > >> pr_info("md/raid:%s: Disabling writeback cache for degraded array.\n", > >> mdname(mddev)); > >> + md_update_sb(mddev, 1); > > > > Why this? And md_update_sb must be called within mddev->reconfig_mutex locked. > > This is to avoid skipping in handle_stripe(): > > if (s.handle_bad_blocks || > test_bit(MD_SB_CHANGE_PENDING, &conf->mddev->sb_flags)) { > set_bit(STRIPE_HANDLE, &sh->state); > goto finish; > } > > I haven't got a better idea than calling md_update_sb() somewhere. It is also tricky > to lock mddev->reconfigured_mutex here, due to potential deadlocking with > mddev->open_mutex. > > Do you have suggestions on this? This sounds not necessary then. The mddev->thread will call md_update_sb and clear the MD_SB_CHANGE_PENDING bit. After that, the stripes will be handled. This is running in a workqueue, I'm wondering what kind of deadlock issue there is. > > >> mddev_suspend(mddev); > >> log->r5c_journal_mode = R5C_JOURNAL_MODE_WRITE_THROUGH; > >> mddev_resume(mddev); > >> @@ -1557,6 +1559,8 @@ void r5l_wake_reclaim(struct r5l_log *log, sector_t space) > >> void r5l_quiesce(struct r5l_log *log, int state) > >> { > >> struct mddev *mddev; > >> + struct r5conf *conf; > >> + > >> if (!log || state == 2) > >> return; > >> if (state == 0) > >> @@ -1564,10 +1568,12 @@ void r5l_quiesce(struct r5l_log *log, int state) > >> else if (state == 1) { > >> /* make sure r5l_write_super_and_discard_space exits */ > >> mddev = log->rdev->mddev; > >> + conf = mddev->private; > >> wake_up(&mddev->sb_wait); > >> kthread_park(log->reclaim_thread->tsk); > >> r5l_wake_reclaim(log, MaxSector); > >> - r5l_do_reclaim(log); > >> + if (!r5l_log_disk_error(conf)) > >> + r5l_do_reclaim(log); > > > > I think r5c_disable_writeback_async() will call into this, so we flush all > > stripe cache out to raid disks, why skip the reclaim? > > > > r5l_do_reclaim() reclaims log space with 2 steps: > 1. clear all io_unit lists (flushing_ios, etc.) by waking up mddev->thread. > 2. update log_tail in the journal device, and issue discard to journal device. > > When we are handling log failures in r5c_disable_writeback_async(), we are > flushing the cache, so it is not necessary to wake up mddev->thread. Also, > with log device error, it is not necessary update log_tail or issue discard. > > Therefore, r5l_do_reclaim is not necessary in log disk errors. Ok, there is no side effect, right? let's call it anyway. Thanks, Shaohua -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html