--- On Wed, 11/8/10, Neil Brown <neilb@xxxxxxx> wrote: > From: Neil Brown <neilb@xxxxxxx> > Subject: Re: sdc1 does not have a valid v0.90 superblock, not importing! > To: Jon@xxxxxxxxxxxxxxx > Cc: jd_hardcastle@xxxxxxxxx, linux-raid@xxxxxxxxxxxxxxx > Date: Wednesday, 11 August, 2010, 12:34 > On Wed, 11 Aug 2010 04:19:07 -0700 > (PDT) > Jon Hardcastle <jd_hardcastle@xxxxxxxxx> > wrote: > > > > > --- On Wed, 11/8/10, Neil Brown <neilb@xxxxxxx> > wrote: > > > > > From: Neil Brown <neilb@xxxxxxx> > > > Subject: Re: sdc1 does not have a valid > v0.90 superblock, not importing! > > > To: Jon@xxxxxxxxxxxxxxx > > > Cc: jd_hardcastle@xxxxxxxxx, > linux-raid@xxxxxxxxxxxxxxx > > > Date: Wednesday, 11 August, 2010, 12:06 > > > On Wed, 11 Aug 2010 02:55:44 -0700 > > > (PDT) > > > Jon Hardcastle <jd_hardcastle@xxxxxxxxx> > > > wrote: > > > > > > > (my first attempt appears to have been > bounced as the > > > spam checker thought it had HTML in it?!) > > > > > > odd... came through ok for me the first time. > > > > > > > > > > > Help! > > > > > > > > Long story short - I was watching a movie > off my RAID6 > > > array. Got a smart error warning > > > > > > > Aug 10 22:00:07 mangalore kernel: raid5: > cannot start > > > dirty degraded array for md4 > > > > > > This is the current problem. The array is dirty > and > > > degraded so there could > > > theoretically be undetectable corruption. > Chance is > > > quite low but it is > > > there so md won't start with out you > acknowledging the risk > > > by giving the > > > --force flag to mdadm --assemble. > > > Only do that if you are confident that your > hardware is > > > working correctly. > > > > Well I am reasonable sure the controller came adrift > the first time.. when i reseated it i stopped getting 100's > of errors.. and it has survived 1.5 badblocks checks. It is > being held in place by one of those bars you press down > (does all the expansion cards in 1 go) except i dont think > it is very good. I will screw it down. > > > > > > > > > It appears sdc has an invalid superblock? > > > > > > > > This is the 'examine' from sdc1 (note the > checksum) > > > > > > > > /dev/sdc1: > > > ..... > > > > Checksum : b335b4e3 - > > > expected b735b4e3 > > > > > > Single bit error. That isn't good as it means > some > > > bit of memory or some bit > > > on some bus somewhere cannot be trusted. > > > It could be a transient thing and will never > happen > > > again. Or maybe not. > > > Given the smart errors and the fact that you have > had > > > problems with the drive > > > before it seem very likely that the problem is in > that > > > drive. I suggest > > > unplugging it and leaving it unplugged. Some > memory > > > buffer in the drive is > > > probably marginal. I don't think they use ECC > > > memory. > > > > Could this be a result of me forcing a power off when > the drive was causing problems? > > Probably not. Forcing a power off may well have left > the array 'dirty' so > that it wouldn't assemble, but is fairly unlikely to > corrupt data within a > block. > > > > > What are the dangers to removing it, zeroing the > superblock and readding? is it MORE dangerous than leaving a > raid 6 degraded for a few days? > > In general, I would say the chance of a known-bad drive > causing problems is > greater than the chance of a fewer known-good drives > causing problems. > But then you seem to think it isn't the drive, it was the > controller and that > is fixed... > > This is really about your level of trust in the hardware. > If you trust sdc as much as the others, include it in the > array. > If you don't, then don't. > > NeilBrown > > > > > > > > > > > > > > > > Anyways... I am ASSUMING mdadm has not > assembled the > > > array to be on the safe side? i have not done > anything.. no > > > force... no assume clean.. I wanted to be sure? > > > > > > You assume correctly. > > > > > > > > > > > Should i remove sdc1 from the array? It > should then > > > assemble? I have 2 spare drives that I am getting > around to > > > using to replace this drive and the other 500GB.. > so should > > > I remove sdc1... and try and re-add or just put > the new > > > drive in? > > > > > > > > atm I have 'stop'ped the array and got > badblocks > > > running.... > > > > > > > > > > Remove sdc and assemble the array with --force, > and get a > > > new device to > > > replace /dev/sdc as soon as possible. > > > > Thanks Neil - I panic'd as previously it has mounted > the array in a degraded state... but previously the drive > has disappeared completely... whereas in this case it is > present... but wrong! > > > > > > > > NeilBrown Hmmm ok. It isn't worth the risk. I can thrash the drive after I have replaced it. OK so now I want to mark the drive as 'removed' but it is proving problematic as the array is not active? # mdadm /dev/md4 --fail /dev/sdc1 mdadm: cannot get array info for /dev/md4 # mdadm --detail /dev/md4 mdadm: md device /dev/md4 does not appear to be active. # mdadm --assemble /dev/md4 mdadm: failed to add /dev/sdc1 to /dev/md4: Invalid argument mdadm: /dev/md4 assembled from 6 drives - not enough to start the array while not clean - consider --force. I really wanted to fail it before trying to assemble the rest? -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html