Re: sdc1 does not have a valid v0.90 superblock, not importing!

Jon Hardcastle <jd_hardcastle@xxxxxxxxx> · Wed, 11 Aug 2010 05:29:59 -0700 (PDT)

--- On Wed, 11/8/10, Neil Brown <neilb@xxxxxxx> wrote:

> From: Neil Brown <neilb@xxxxxxx>
> Subject: Re:  sdc1 does not have a valid v0.90 superblock, not importing!
> To: Jon@xxxxxxxxxxxxxxx
> Cc: jd_hardcastle@xxxxxxxxx, linux-raid@xxxxxxxxxxxxxxx
> Date: Wednesday, 11 August, 2010, 12:34
> On Wed, 11 Aug 2010 04:19:07 -0700
> (PDT)
> Jon Hardcastle <jd_hardcastle@xxxxxxxxx>
> wrote:
> 
> > 
> > --- On Wed, 11/8/10, Neil Brown <neilb@xxxxxxx>
> wrote:
> > 
> > > From: Neil Brown <neilb@xxxxxxx>
> > > Subject: Re:  sdc1 does not have a valid
> v0.90 superblock, not importing!
> > > To: Jon@xxxxxxxxxxxxxxx
> > > Cc: jd_hardcastle@xxxxxxxxx,
> linux-raid@xxxxxxxxxxxxxxx
> > > Date: Wednesday, 11 August, 2010, 12:06
> > > On Wed, 11 Aug 2010 02:55:44 -0700
> > > (PDT)
> > > Jon Hardcastle <jd_hardcastle@xxxxxxxxx>
> > > wrote:
> > > 
> > > > (my first attempt appears to have been
> bounced as the
> > > spam checker thought it had HTML in it?!)
> > > 
> > > odd... came through ok for me the first time.
> > > 
> > > > 
> > > > Help!
> > > > 
> > > > Long story short - I was watching a movie
> off my RAID6
> > > array. Got a smart error warning
> > > 
> > > > Aug 10 22:00:07 mangalore kernel: raid5:
> cannot start
> > > dirty degraded array for md4
> > > 
> > > This is the current problem.  The array is dirty
> and
> > > degraded so there could
> > > theoretically be undetectable corruption. 
> Chance is
> > > quite low but it is
> > > there so md won't start with out you
> acknowledging the risk
> > > by giving the
> > > --force flag to mdadm --assemble.
> > > Only do that if you are confident that your
> hardware is
> > > working correctly.
> > 
> > Well I am reasonable sure the controller came adrift
> the first time.. when i reseated it i stopped getting 100's
> of errors.. and it has survived 1.5 badblocks checks. It is
> being held in place by one of those bars you press down
> (does all the expansion cards in 1 go) except i dont think
> it is very good. I will screw it down.
> > 
> > > 
> > > > It appears sdc has an invalid superblock?
> > > > 
> > > > This is the 'examine' from sdc1 (note the
> checksum)
> > > > 
> > > > /dev/sdc1:
> > > .....
> > > >       Checksum : b335b4e3 -
> > > expected b735b4e3
> > > 
> > > Single bit error.  That isn't good as it means
> some
> > > bit of memory or some bit
> > > on some bus somewhere cannot be trusted.
> > > It could be a transient thing and will never
> happen
> > > again.  Or maybe not.
> > > Given the smart errors and the fact that you have
> had
> > > problems with the drive
> > > before it seem very likely that the problem is in
> that
> > > drive.  I suggest
> > > unplugging it and leaving it unplugged.  Some
> memory
> > > buffer in the drive is
> > > probably marginal.  I don't think they use ECC
> > > memory.
> > 
> > Could this be a result of me forcing a power off when
> the drive was causing problems?
> 
> Probably not.  Forcing a power off may well have left
> the array 'dirty' so
> that it wouldn't assemble, but is fairly unlikely to
> corrupt data within a
> block.
> 
> > 
> > What are the dangers to removing it, zeroing the
> superblock and readding? is it MORE dangerous than leaving a
> raid 6 degraded for a few days?
> 
> In general, I would say the chance of a known-bad drive
> causing problems is
> greater than the chance of a fewer known-good drives
> causing problems.
> But then you seem to think it isn't the drive, it was the
> controller and that
> is fixed...
> 
> This is really about your level of trust in the hardware.
> If you trust sdc as much as the others, include it in the
> array.
> If you don't, then don't.
> 
> NeilBrown
> 
> 
> 
> > 
> > > 
> > > > 
> > > > Anyways... I am ASSUMING mdadm has not
> assembled the
> > > array to be on the safe side? i have not done
> anything.. no
> > > force... no assume clean.. I wanted to be sure?
> > > 
> > > You assume correctly.
> > > 
> > > > 
> > > > Should i remove sdc1 from the array? It
> should then
> > > assemble? I have 2 spare drives that I am getting
> around to
> > > using to replace this drive and the other 500GB..
> so should
> > > I remove sdc1... and try and re-add or just put
> the new
> > > drive in?
> > > > 
> > > > atm I have 'stop'ped the array and got
> badblocks
> > > running....
> > > > 
> > > 
> > > Remove sdc and assemble the array with --force,
> and get a
> > > new device to
> > > replace /dev/sdc as soon as possible.
> > 
> > Thanks Neil - I panic'd as previously it has mounted
> the array in a degraded state... but previously the drive
> has disappeared completely... whereas in this case it is
> present... but wrong!
> > 
> > > 
> > > NeilBrown

Hmmm ok. It isn't worth the risk. I can thrash the drive after I have replaced it.

OK so now I want to mark the drive as 'removed' but it is proving problematic as the array is not active?

# mdadm /dev/md4 --fail /dev/sdc1
mdadm: cannot get array info for /dev/md4

# mdadm --detail /dev/md4
mdadm: md device /dev/md4 does not appear to be active.

# mdadm --assemble /dev/md4
mdadm: failed to add /dev/sdc1 to /dev/md4: Invalid argument
mdadm: /dev/md4 assembled from 6 drives - not enough to start the array while not clean - consider --force.

I really wanted to fail it before trying to assemble the rest?

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html