Re: Reshape Shrink Hung Again

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 19 Apr 2013 08:29:37 +0000 Sam Bingner <sam@xxxxxxxxxxx> wrote:

> I'll start this off by saying that no data is in jeopardy, but I would like to track down the cause of this problem and fix it.  I originally thought it must have been due to the incorrect backup-file size with a raid array shrunk to smaller than the final size when it happened to me last time but this time this was not the case.
> 
> I initiated a shrink from a 4-drive RAID5 to a 3-drive RAID5, this shrink had no problems except that a drive failed right at the end of the reshape... then it hung at 99.9% and does not allow me to remove the failed drive from the array because it is "rebuilding".  I am not sure if the drive failed at the end, or if it was after it had gotten to 99.9% because I didn't see this until the next morning as it ran overnight.
> 
> Sam
> 
> root@fs:/var/log# uname -a
> Linux fs 2.6.32-5-686 #1 SMP Mon Jan 16 16:04:25 UTC 2012 i686 GNU/Linux
> 
> Apr 17 22:37:41 fs kernel: [25860779.639762] md1: detected capacity change from 749122093056 to 499414728704
> Apr 17 22:38:40 fs kernel: [25860837.912441] md: reshape of RAID array md1
> Apr 17 22:38:40 fs kernel: [25860837.912447] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> Apr 17 22:38:40 fs kernel: [25860837.912452] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape.
> Apr 17 22:38:40 fs kernel: [25860837.912459] md: using 128k window, over a total of 243854848 blocks.
> Apr 18 07:51:09 fs kernel: [25893987.273813] raid5: Disk failure on sda2, disabling device.
> Apr 18 07:51:09 fs kernel: [25893987.273815] raid5: Operation continuing on 2 devices.
> Apr 18 07:51:09 fs kernel: [25893987.287168] md: super_written gets error=-5, uptodate=0
> Apr 18 07:51:10 fs kernel: [25893987.657039] md: md1: reshape done.
> Apr 18 07:51:10 fs kernel: [25893987.781599] md: reshape of RAID array md1
> Apr 18 07:51:10 fs kernel: [25893987.781607] md: minimum _guaranteed_  speed: 100 KB/sec/disk.
> Apr 18 07:51:10 fs kernel: [25893987.781613] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape.
> Apr 18 07:51:10 fs kernel: [25893987.781620] md: using 128k window, over a total of 243854848 blocks.
> 
> 
> md1 : active raid5 sdd2[3] sda2[0](F) sdc2[2] sdb2[4]
>       487709696 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]
>       [===================>.]  reshape = 99.9% (243853824/243854848) finish=343.6min speed=0K/sec
> 

Looks like a bug - probably in mdadm.
mdadm needs to help the reshape over the last little bit, and md is probably
waiting for it to do that.  This will be the only time in the whole process
when the backup file is used.

I would try stopping the array and re-assembling it.  That might require a
reboot.  If that doesn't fix it, let me know and I'll prioritise this.
Otherwise - I've put it on my to-do list.  I'll try to reproduce and fix it
in due course.

Thanks for the report,
NeilBrown

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux