Re: Failed, but "md: cannot remove active disk..."

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 14 May 2012 12:53:00 +0200 Michał Sawicz <michal@xxxxxxxxxx> wrote:

> Dnia 2012-05-14, pon o godzinie 20:22 +1000, NeilBrown pisze:
> > On Sun, 13 May 2012 20:21:48 +0200 Michał Sawicz <michal@xxxxxxxxxx> wrote:
> > 
> > > Hey,
> > > 
> > > I've a weird issue with a RAID6 setup, /proc/mdstat says:
> > > 
> > > > md126 : active raid6 sda1[3] sdh1[6] sdg1[0](F) sdf1[5] sdi1[1] sdc[8] sdb[7]
> > > >       9767559680 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/6] [_UUUUUU]
> > > 
> > > So sdg1 is (F)ailed, yet `mdadm --remove` yields:
> > > 
> > > > md: cannot remove active disk sdg1 from md126 ...
> > 
> > There is a period of time between when a device fails and when the raid456
> > module finally lets go of it so it can be removed.  You seem to be in this
> > period of time.
> > Normally it is very short.  It needs to wait for any requests that have
> > already been sent to the device to complete (probably with failure) and
> > very shortly after that it should be released.  So this is normally much less
> > than one second but could be several seconds is some excessive retry is
> > happening.
> > 
> > But I'm guessing you have waited more than a few seconds.
> 
> Yup :)
> 
> > I vaguely recall a bug in the not too distant past whereby RAID456 wouldn't
> > let go of a device quite as soon as it should.  Unfortunately I don't
> > remember the details.  You might be able to trigger it to release the drive
> > by adding a spare - if you have one - or maybe by just
> >   echo sync > /sys/block/md126/md/sync_action
> > it won't actually do a sync, but it might check things enough to make
> > progress.
> 
> # echo sync > /sys/block/md126/md/sync_action
> -bash: echo: write error: Device or resource busy

Hmmm....

Looks like MD_RECOVERY_NEEDED is already set.
But remove_and_add_spares() isn't removing the failed device
from the array.

I cannot find anything since 2.6.38 that looks like your symptoms.

Is the array still functioning?
Are there any interesting messages appearing in the kernel logs?

What does
  grep . /sys/block/md126/md/dev*/*
show?

NeilBrown


> 
> eh?
> 
> > What kernel are you using?
> 
> # uname -a
> Linux media 2.6.38-gentoo-r6 #2 SMP Tue Sep 13 19:13:42 CEST 2011 x86_64
> AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ AuthenticAMD GNU/Linux
> 
> Thanks,

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux