On Tuesday 22 June 2010, John R Pierce wrote: > On 06/22/10 12:21 AM, Peter Kjellstrom wrote: > > On Tuesday 22 June 2010, Eric Deis wrote: > >> I have recently upgraded to 2.6.18-194.3.1.el5 and within several days > >> the machine crashed with the following error (repeating in mcelog): > > > > I'm guessing the old kernel just didn't notice. > > > > The below MCEs indicate bad hardware. Since the DIMMs are a lot easier to > > debug I'd suggest you start there (but it could be the systemboard too). > > Try running with half you DIMMs then the other half. > > and on nehalem (xeon 5500, 5600), the memory controller is in the CPUs, > so they are suspect too. In theory, yes. But while we've replaced many DIMMS and some system boards I don't think we've replaced a single (nehalem type) CPU (this observed during ~10000 CPU-months). /Peter
Attachment:
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx http://lists.centos.org/mailman/listinfo/centos