Re: raid5:md3: read error corrected , followed by , Machine Check

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Mr. James W. Laferriere wrote:

One more thought below...
On Mon, 23 Jul 2007, Bill Davidsen wrote:
Mr. James W. Laferriere wrote:
    Hello Andrew ,

On Tue, 17 Jul 2007, Andrew Burgess wrote:
The 'MCE's have been ongoing for sometime . I have replaced every item in the system except the chassis & scsi backplane & power supply(750Watts) .
    Everything .  MB,cpu,memory,scsi controllers, ...
These MCE's only happen when I am trying to build or bonnie++ test the
md3 .  It consists of (now 7+1spare) 146GB drives in the SuperMicro
SYS-6035B-8B's backplane attached to a LSI22320 .

Probably every old timer has a story about chasing a hardware problem
where changing the power supply finally fixed it. I keep spares now.

If an MCE (which means bad cpu) doesn't go away after changing the cpu
it would either have to be temperature, power or a bug in the MCE code.
What else could it be?

Thank you for the idea of 'changing out the PS' . So I did it a bit differant . I removed the system PS from the raid backplane & dropped in a known good ps of proper wattage & re-tested . But left the systems ps attached to only the MB & fans . It doesn't appear to be power load related . I tried rebuilding my 7 disk raid6 array & I got the same thing , MCE . Now the raid backplane is still in the air stream in front of the cpu's and memory slots . So it could be a marginal cpu or memory stick .

But here's the clincher , when I don't use the two drives in from of the PS & cpu & memory slots . The array completes it's resync . So I'm back to testing memory (again) , If that passes then I'll try the new cpu(s) route .

It does sound like a cooling problem, which does not have to imply the overheated parts are bad, although that may be true.
Fyi , memtest86+ @ 19 passes (~ 52hours) on 8GB of memory , no errors .

Could be the total number of i/o in flight, etc.
    Hmmm ,  I didn't think of this one .

Those are a PITA to find of that's it, doesn't sound likely to be power supply, as an unlikely but cheap test, have you reseated the p/s to backplane connectors? Oh and checked that the system board is grounded to the case?
Have you tried dropping two other drives?
Well , no . I dropped those two in front of the CPU as a test in working my way up the scsi backplane(BP) trying to find a point that worked & the last two drives in the BP just happened to be in front of the cpu/memory air path . The minute I put those in the MD build tree within the usual time frame I get a MCE . What I have'nt tried is what you are probably suggesting make sure it is the drives in the air path by putting them in the MD build and leaving another two out . I'll try that as well .

Can you put in a bit more fan?
    Nope ,  It's maxed out .  sounds like a 747 on take off as it is .
It's a supermicro SYS-6035B-8B if you have the time to go look at the specs & pics .

What I was thinking is that some of my cases actually have room to install fans in front of the drives, allowing push as well as pull. Haven't had to do it in several years, but looking at my tall tower cases, I believe I could.
Read the system board and CPU temps with the "sensors" package?
    Not yet ,  I am building the need items into the kernel now .
    Will report back (hopefully) sometime this weekend .

Keep us posted, you have picked the low-hanging fruit, when you find out what causes this I'm sure it will be something interesting.

--
bill davidsen <davidsen@xxxxxxx>
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux