On Sat, 4 Dec 2010 07:26:36 +0300 "Majed B." <majedb@xxxxxxxxx> wrote: > You have a degraded array now with 1 disk down. If you proceed, more > disks might pop out due to errors. > > It's best to backup your data, run a check on the array, fix it then > try to resume the reshape. Backups are always a good idea, but are sometimes impractical. I don't think running a 'check' would help at all. A 'reshape' will do much the same sort of work, and more. It isn't strictly true that the array is '1 disk down'. Parts of it are 1 disk down, parts are 2 disks down. As the reshape progresses more and more will be 2 disks down. We don't really want that. This case isn't really handled well at present. You want to do a 'recovery' and a 'reshape' at the same time. This is quite possible, but doesn't currently happen when you restart a reshape in the middle (added to my todo list). I suggest you: - apply the patch below to mdadm. - assemble the array with --update=revert-reshape. You should give it a --backup-file too. - let the reshape complete so you are back to 13 devices. - add a spare and let it recovery - then add a spare and reshape the array. Of course you needed to be running a new enough kernel to be able decrease the number of devices in a raid5. NeilBrown > > On Sat, Dec 4, 2010 at 5:42 AM, Leslie Rhorer <lrhorer@xxxxxxxxxxx> wrote: > > > > Hello everyone. > > > >  I was just growing one of my RAID6 arrays from 13 to 14 > > members. The array growth had passed its critical stage and had been > > growing for several minutes when the system came to a screeching halt. It > > hit the big red switch, and when the system rebooted, the array assembled, > > but two members are missing. One of the members is the new drive and the > > other is the 13th drive in the RAID set. Of course, the array can run well > > enough with only 12 members, but itâs definitely not the best situation, > > especially since the re-shape will take another day and a half. Is it best > > I go ahead and leave the array in its current state until the re-shape is > > done, or should I go ahead and add back the two failed drives? > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at Âhttp://vger.kernel.org/majordomo-info.html > > > > -- >   Â Majed B. > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html commit 12bab17f765a4130c7bd133a0bbb3b83f3f492b0 Author: NeilBrown <neilb@xxxxxxx> Date: Sat Dec 4 17:37:14 2010 +1100 Support reverting of reshape. Allow --update=revert-reshape to do what you would expect. FIXME needs review. Think about interface and use cases. Document. diff --git a/Assemble.c b/Assemble.c index afd4e60..c034e37 100644 --- a/Assemble.c +++ b/Assemble.c @@ -592,6 +592,12 @@ int Assemble(struct supertype *st, char *mddev, /* Ok, no bad inconsistancy, we can try updating etc */ bitmap_done = 0; content->update_private = NULL; + if (update && strcmp(update, "revert-reshape") == 0 && + (content->reshape_active == 0 || content->delta_disks <= 0)) { + fprintf(stderr, Name ": Cannot revert-reshape on this array\n"); + close(mdfd); + return 1; + } for (tmpdev = devlist; tmpdev; tmpdev=tmpdev->next) if (tmpdev->used == 1) { char *devname = tmpdev->devname; struct stat stb; diff --git a/mdadm.c b/mdadm.c index 08e8ea4..7cf51b5 100644 --- a/mdadm.c +++ b/mdadm.c @@ -662,6 +662,8 @@ int main(int argc, char *argv[]) continue; if (strcmp(update, "devicesize")==0) continue; + if (strcmp(update, "revert-reshape")==0) + continue; if (strcmp(update, "byteorder")==0) { if (ss) { fprintf(stderr, Name ": must not set metadata type with --update=byteorder.\n"); @@ -688,7 +690,8 @@ int main(int argc, char *argv[]) } fprintf(outf, "Valid --update options are:\n" " 'sparc2.2', 'super-minor', 'uuid', 'name', 'resync',\n" - " 'summaries', 'homehost', 'byteorder', 'devicesize'.\n"); + " 'summaries', 'homehost', 'byteorder', 'devicesize',\n" + " 'revert-reshape'.\n"); exit(outf == stdout ? 0 : 2); case O(INCREMENTAL,NoDegraded): diff --git a/super0.c b/super0.c index ae3e885..01d5cfa 100644 --- a/super0.c +++ b/super0.c @@ -545,6 +545,19 @@ static int update_super0(struct supertype *st, struct mdinfo *info, } if (strcmp(update, "_reshape_progress")==0) sb->reshape_position = info->reshape_progress; + if (strcmp(update, "revert-reshape") == 0 && + sb->minor_version > 90 && sb->delta_disks != 0) { + int tmp; + sb->raid_disks -= sb->delta_disks; + sb->delta_disks = - sb->delta_disks; + tmp = sb->new_layout; + sb->new_layout = sb->layout; + sb->layout = tmp; + + tmp = sb->new_chunk; + sb->new_chunk = sb->chunk_size; + sb->chunk_size = tmp; + } sb->sb_csum = calc_sb0_csum(sb); return rv; diff --git a/super1.c b/super1.c index 0eb0323..805777e 100644 --- a/super1.c +++ b/super1.c @@ -781,6 +781,19 @@ static int update_super1(struct supertype *st, struct mdinfo *info, } if (strcmp(update, "_reshape_progress")==0) sb->reshape_position = __cpu_to_le64(info->reshape_progress); + if (strcmp(update, "revert-reshape") == 0 && sb->delta_disks) { + __u32 temp; + sb->raid_disks = __cpu_to_le32(__le32_to_cpu(sb->raid_disks) + __le32_to_cpu(sb->delta_disks)); + sb->delta_disks = __cpu_to_le32(-__le32_to_cpu(sb->delta_disks)); + printf("REverted to %d\n", (int)__le32_to_cpu(sb->delta_disks)); + temp = sb->new_layout; + sb->new_layout = sb->layout; + sb->layout = temp; + + temp = sb->new_chunk; + sb->new_chunk = sb->chunksize; + sb->chunksize = temp; + } sb->sb_csum = calc_sb_1_csum(sb); return rv; -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html