On Fri, Mar 30, 2018 at 01:26:09PM -0700, Song Liu wrote: > On Tue, Mar 27, 2018 at 4:54 PM, Alexei Naberezhnov <anaberezhnov@xxxxxx> wrote: > > This fixes the case when md array assembly fails because of raid cache recovery > > unable to allocate a stripe, despite attempts to replay stripes and increase > > cache size. This happens because stripes released by r5c_recovery_replay_stripes > > and raid5_set_cache_size don't become available for allocation immediately. > > Released stripes first are placed on conf->released_stripes list and require > > md thread to merge them on conf->inactive_list before they can be allocated. > > > > Patch allows final allocation attempt during cache recovery to wait for > > new stripes to become availabe for allocation. > > > > CC: linux-raid@xxxxxxxxxxxxxxx > > CC: Shaohua Li <shli@xxxxxxxxxx> > > Signed-off-by: Alexei Naberezhnov <anaberezhnov@xxxxxx> > > --- > > drivers/md/raid5-cache.c | 33 ++++++++++++++++++++++----------- > > drivers/md/raid5.c | 8 ++++++-- > > 2 files changed, 28 insertions(+), 13 deletions(-) > > > > diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c > > index 3c65f52..bb1ebbf 100644 > > --- a/drivers/md/raid5-cache.c > > +++ b/drivers/md/raid5-cache.c > > @@ -1937,12 +1937,14 @@ r5l_recovery_replay_one_stripe(struct r5conf *conf, > > } > > > > static struct stripe_head * > > -r5c_recovery_alloc_stripe(struct r5conf *conf, > > - sector_t stripe_sect) > > +r5c_recovery_alloc_stripe( > > + struct r5conf *conf, > > + sector_t stripe_sect, > > + int noblock) > > { > > struct stripe_head *sh; > > > > - sh = raid5_get_active_stripe(conf, stripe_sect, 0, 1, 0); > > + sh = raid5_get_active_stripe(conf, stripe_sect, 0, noblock, 0); > > if (!sh) > > return NULL; /* no more stripe available */ > > > > @@ -2152,7 +2154,7 @@ r5c_recovery_analyze_meta_block(struct r5l_log *log, > > stripe_sect); > > > > if (!sh) { > > - sh = r5c_recovery_alloc_stripe(conf, stripe_sect); > > + sh = r5c_recovery_alloc_stripe(conf, stripe_sect, 1); > > /* > > * cannot get stripe from raid5_get_active_stripe > > * try replay some stripes > > @@ -2161,20 +2163,29 @@ r5c_recovery_analyze_meta_block(struct r5l_log *log, > > r5c_recovery_replay_stripes( > > cached_stripe_list, ctx); > > sh = r5c_recovery_alloc_stripe( > > - conf, stripe_sect); > > + conf, stripe_sect, 1); > > } > > if (!sh) { > > + int new_size = conf->min_nr_stripes * 2; > > pr_debug("md/raid:%s: Increasing stripe cache size to %d to recovery data on journal.\n", > > mdname(mddev), > > - conf->min_nr_stripes * 2); > > - raid5_set_cache_size(mddev, > > - conf->min_nr_stripes * 2); > > - sh = r5c_recovery_alloc_stripe(conf, > > - stripe_sect); > > + new_size); > > + ret = raid5_set_cache_size(mddev, new_size); > > + if (conf->min_nr_stripes <= new_size / 2) { > > + pr_err("md/raid:%s: Cannot increase cache size, ret=%d, new_size=%d, min_nr_stripes=%d, max_nr_stripes=%d\n", > > + mdname(mddev), > > + ret, > > + new_size, > > + conf->min_nr_stripes, > > + conf->max_nr_stripes); > > + return -ENOMEM; > > + } > > + sh = r5c_recovery_alloc_stripe( > > + conf, stripe_sect, 0); > > } > > if (!sh) { > > pr_err("md/raid:%s: Cannot get enough stripes due to memory pressure. Recovery failed.\n", > > - mdname(mddev)); > > + mdname(mddev)); > > return -ENOMEM; > > } > > list_add_tail(&sh->lru, cached_stripe_list); > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > > index b5d2601..76c6720 100644 > > --- a/drivers/md/raid5.c > > +++ b/drivers/md/raid5.c > > @@ -6347,6 +6347,7 @@ raid5_show_stripe_cache_size(struct mddev *mddev, char *page) > > int > > raid5_set_cache_size(struct mddev *mddev, int size) > > { > > + int result = 0; > > struct r5conf *conf = mddev->private; > > > > if (size <= 16 || size > 32768) > > @@ -6363,11 +6364,14 @@ raid5_set_cache_size(struct mddev *mddev, int size) > > > > mutex_lock(&conf->cache_size_mutex); > > while (size > conf->max_nr_stripes) > > - if (!grow_one_stripe(conf, GFP_KERNEL)) > > + if (!grow_one_stripe(conf, GFP_KERNEL)) { > > + conf->min_nr_stripes = conf->max_nr_stripes; > > + result = -ENOMEM; > > break; > > + } > > mutex_unlock(&conf->cache_size_mutex); > > > > - return 0; > > + return result; > > } > > EXPORT_SYMBOL(raid5_set_cache_size); > > > > -- > > 2.9.5 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > This fix looks good to me, as a fix. However, we still need to wake up > md_thread during recovery. A better solution would be a new way to > allocate stripe that doesn't require md_thread. I think we can ship this > fix first, and introduce the new stripe allocation later. > > What do you think Shaohua? I'm thinking of a more generic method for this problem. Could we make ->max_free_space adaptive to system memory size. It could be min(RECLAIM_MAX_FREE_SPACE, 3/4 * total_memory_size) for example. In this way, we never make the stripe cache too big. This has penality for small memory size system though. Thanks, Shaohua -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html