Re: Fw: Why does one get mismatches?

Jon Hardcastle <jd_hardcastle@xxxxxxxxx> · Sun, 24 Jan 2010 09:40:42 -0800 (PST)

--- On Fri, 22/1/10, Goswin von Brederlow <goswin-v-b@xxxxxx> wrote:

> From: Goswin von Brederlow <goswin-v-b@xxxxxx>
> Subject: Re: Fw: Why does one get mismatches?
> To: Jon@xxxxxxxxxxxxxxx
> Cc: linux-raid@xxxxxxxxxxxxxxx
> Date: Friday, 22 January, 2010, 18:13
> Jon Hardcastle <jd_hardcastle@xxxxxxxxx>
> writes:
> 
> > --- On Tue, 19/1/10, Jon Hardcastle <jd_hardcastle@xxxxxxxxx>
> wrote:
> >
> >> From: Jon Hardcastle <jd_hardcastle@xxxxxxxxx>
> >> Subject: Why does one get mismatches?
> >> To: linux-raid@xxxxxxxxxxxxxxx
> >> Date: Tuesday, 19 January, 2010, 10:04
> >> Hi,
> >> 
> >> I kicked off a check/repair cycle on my machine
> after i
> >> moved the phyiscal ordering of my drives around
> and I am now
> >> on my second check/repair cycle and it has kept
> finding
> >> mismatches.
> >> 
> >> Is it correct that the mismatch value after a
> repair was
> >> needed should equal the value present after a
> check? What if
> >> it doesn't? What does it mean if another check
> STILL reveals
> >> mismatches?
> >> 
> >> I had something similar after i reshaped from raid
> 5 to 6 i
> >> had to run check/repair/check/repair several times
> before i
> >> got my 0.
> >> 
> >> 
> >
> > Guys,
> >
> > Anyone got any suggestions here? I am now on my ~5
> check/repair and after a reboot the first check is still
> returning 8.
> >
> > All i have done is move the drives around. It is the
> same controllers/cables/etc 
> >
> > I really dont like the seeming random nature of what
> can/does/has caused the mismatches?
> 
> There is some unknown corruption going on with raid1 that
> causes
> mismatches but it is believed that it will never occur on
> any used
> block. Swapping is a likely cause.
> 
> Any swap device on the raid? Try turning that off.
> If that doesn't help try umounting filesystems or
> remounting RO.
> 
> MfG
>         Goswin

Hello, my usual savior Goswin!

The deal is it is a 7 drive raid 6 array. it has LVM on it and is not used for swapping. I have umounted all LV's and still got mismatches, i run smartctl --test=long on all drives - nothing. I have now dismantled the array and am 3/4 the way through 'badblocks -svn' on each of the component drive. I have a hunch that it may be a dodgy SATA cable but have no evidence. No errors in log, nothing on dmesg.

Is there any way to get more information? I am starting to think this is more happened since i changed from raid 5 to 6..... which i did < 1 month ago.

The only lead i have is that whilst doing the bad blocks 1 drive ran at ~10~15MB/s whereas the rest are going at ~30 i have another identical model drive coming up so i will see if that one is slow too. But the lack of logging info is not helpful and worrying! and the prospect of silent corruption a big worry!

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html