I'll start this off by saying that no data is in jeopardy, but I would like to track down the cause of this problem and fix it. I originally thought it must have been due to the incorrect backup-file size with a raid array shrunk to smaller than the final size when it happened to me last time but this time this was not the case. I initiated a shrink from a 4-drive RAID5 to a 3-drive RAID5, this shrink had no problems except that a drive failed right at the end of the reshape... then it hung at 99.9% and does not allow me to remove the failed drive from the array because it is "rebuilding". I am not sure if the drive failed at the end, or if it was after it had gotten to 99.9% because I didn't see this until the next morning as it ran overnight. Sam root@fs:/var/log# uname -a Linux fs 2.6.32-5-686 #1 SMP Mon Jan 16 16:04:25 UTC 2012 i686 GNU/Linux Apr 17 22:37:41 fs kernel: [25860779.639762] md1: detected capacity change from 749122093056 to 499414728704 Apr 17 22:38:40 fs kernel: [25860837.912441] md: reshape of RAID array md1 Apr 17 22:38:40 fs kernel: [25860837.912447] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Apr 17 22:38:40 fs kernel: [25860837.912452] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape. Apr 17 22:38:40 fs kernel: [25860837.912459] md: using 128k window, over a total of 243854848 blocks. Apr 18 07:51:09 fs kernel: [25893987.273813] raid5: Disk failure on sda2, disabling device. Apr 18 07:51:09 fs kernel: [25893987.273815] raid5: Operation continuing on 2 devices. Apr 18 07:51:09 fs kernel: [25893987.287168] md: super_written gets error=-5, uptodate=0 Apr 18 07:51:10 fs kernel: [25893987.657039] md: md1: reshape done. Apr 18 07:51:10 fs kernel: [25893987.781599] md: reshape of RAID array md1 Apr 18 07:51:10 fs kernel: [25893987.781607] md: minimum _guaranteed_ speed: 100 KB/sec/disk. Apr 18 07:51:10 fs kernel: [25893987.781613] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape. Apr 18 07:51:10 fs kernel: [25893987.781620] md: using 128k window, over a total of 243854848 blocks. md1 : active raid5 sdd2[3] sda2[0](F) sdc2[2] sdb2[4] 487709696 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU] [===================>.] reshape = 99.9% (243853824/243854848) finish=343.6min speed=0K/sec root@fs:/# mdadm --remove /dev/md1 /dev/sda2 mdadm: hot remove failed for /dev/sda2: Device or resource busy root@fs:/# mdadm --manage /dev/md1 --force --remove /dev/sda2 mdadm: hot remove failed for /dev/sda2: Device or resource busy root@fs:/var/log# ls -l /boot/backup.md -rw------- 1 root root 3146240 Apr 17 22:38 /boot/backup.md root@fs:/var/log# hexdump /boot/backup.md 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 0300200 root@fs:/# mdadm --detail /dev/md1 /dev/md1: Version : 1.2 Creation Time : Fri Feb 10 21:45:46 2012 Raid Level : raid5 Array Size : 487709696 (465.12 GiB 499.41 GB) Used Dev Size : 243854848 (232.56 GiB 249.71 GB) Raid Devices : 3 Total Devices : 4 Persistence : Superblock is persistent Update Time : Thu Apr 18 21:37:48 2013 State : clean, degraded, recovering Active Devices : 3 Working Devices : 3 Failed Devices : 1 Spare Devices : 0 Layout : left-symmetric Chunk Size : 512K Reshape Status : 99% complete Delta Devices : -1, (3->2) Name : fs:1 (local to host fs) UUID : 9d7e8a08:030af4f8:e653c46c:af2c84fe Events : 33773764 Number Major Minor RaidDevice State 0 8 2 0 faulty spare rebuilding /dev/sda2 4 8 18 1 active sync /dev/sdb2 2 8 34 2 active sync /dev/sdc2 3 8 50 3 active sync /dev/sdd2 /dev/sdd2: Magic : a92b4efc Version : 1.2 Feature Map : 0x4 Array UUID : 9d7e8a08:030af4f8:e653c46c:af2c84fe Name : fs:1 (local to host fs) Creation Time : Fri Feb 10 21:45:46 2012 Raid Level : raid5 Raid Devices : 3 Avail Dev Size : 487710720 (232.56 GiB 249.71 GB) Array Size : 975419392 (465.12 GiB 499.41 GB) Used Dev Size : 487709696 (232.56 GiB 249.71 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : 13cefd7d:7bb42450:c229d326:a41b9ba7 Reshape pos'n : 2048 Delta Devices : -1 (4->3) Update Time : Fri Apr 19 04:22:40 2013 Checksum : 2f033b35 - correct Events : 33786736 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 3 Array State : .AAA ('A' == active, '.' == missing) -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html