On November 19, 2014 3:40:38 PM EST, Mark Lee <mark@xxxxxxxxxxxx> wrote: >-----BEGIN PGP SIGNED MESSAGE----- >Hash: SHA256 > >On 11/19/2014 12:15 PM, Rasmus Liland wrote: >> On 2014-11-17 00:19, Rasmus Liland wrote: >>> On 2014-11-15 18:28, Mark Lee wrote: >>>> On 11/15/2014 12:20 PM, Rasmus Liland wrote: >>>>> On 2014-11-15 15:21, LoneVVolf wrote: >>>>>> On 15-11-14 06:57, Rasmus Liland wrote: >>>>>>> On 2014-11-15 06:10, Mark Lee wrote: >>>>>>>> On 11/14/2014 10:29 PM, Rasmus Liland wrote: >>>>>>>>> On 2014-11-15 04:01, Mark Lee wrote: >>>>>>>>>> Are you booting with the new intel u-code? >>>>>>>>> Are you fairly sure this is a Intel microcode issue? >>>>>>>> I'm not completely certain; but it would make sense. >>>>>>>> I'd test it out. >>>>>>> Thank you for your help thus far. I'll examine this >>>>>>> further tomorrow, g'night. >>>>>> From rasmus first post: >>>>>>> I'm experiencing machine check exceptions since every >>>>>>> kernel after package linux-3.11.5-1 (Oct 14 2013) >>>>>> New intel microcode was only introduced with kernel 3.17 >>>>>> ... It's unlikely to have to do with this issue. >>>>>> >>>>>> install mcelog, run it as the log tells you and post the >>>>>> result. >>>>> [ ... output, see previous messages ... ] I never did use the >>>>> mcelog tool before, but to me it looks like not much of an >>>>> analysis, perhaps I'm doing it wrong. >>>> Looks like a microcode error, please try to add the intel-ucode >>>> to your kernel cmdline. >>> Bah, just as I was finished enabling syslinux using >>> syslinux-install_update and rebooted, the system did not respond, >>> just a blank screen and lighting shutting off, then rebooting >>> again. >>> >>> Thus, this system needs an overhaul -- apparently some difficulty >>> with the bootcode or the MBR, though I am able to mount the old >>> partitions and chroot into them using arch-chroot. >>> >>> I tried installing grub using the standard method grub-install >>> according to the wiki, with little success -- some good news at >>> least relevant to previous topic in this thread is that grub >>> recognized and added the intel-ucode file I had copied to the >>> /boot directory, when running grub-mkconfig. >>> >>> The plan forward is to forget about generating new mbr using >>> gpart and install Debian at the end of the disk to, hopefully, >>> restore some boot related stuff that might have come crashing >>> down after meddling with syslinux. >> >> A breakthrough in this thread has happened. >> >> I ended up taking a backup of the disk to an external hdd using >> >>> # dd if=/dev/sda of=/mnt/angrist-sda-18nov14.img >> >> then I booted FreeBSD 10.1 memstick, entered shell and entered some >> commands: >> >>> # gpart delete -i 1 ada0 # gpart delete -i 2 ada0 # gpart delete >>> -i 3 ada0 # gpart destroy ada0 # gpart create -s mbr ada0 # gpart >>> add -s 20g -t linux-data ada0 # gpart add -t linux-data ada0 >> >> Then I rebooted into ArchLinux iso memstick to install Arch on the >> 20G partition and using the other one as /home. So now Syslinux >> works, unfortunately I don't know why. And I was able to install >> all new packages including linux 3.17.3-1 and intel-ucode >> 20140913-1, loading it in Syslinux according to the wiki. >> >> I got a new mce after exactly three hours: >> >>> [10827.051523] mce: [Hardware Error]: CPU 1: Machine Check >>> Exception: 5 Bank 4: b200000000100402 Increasing limit for this >>> warning to that value arg [10827.051632] mce: [Hardware Error]: >>> RIP !INEXACT! 10:<ffffffff81321387> {intel_idle+0xe7/0x180} >>> [10827.055440] mce: [Hardware Error]: TSC 2238c73db17 >>> [10827.059291] mce: [Hardware Error]: PROCESSOR 0:306a9 TIME >>> 1416411506 SOCKET 0 APIC 1 microcode 1b [10827.063192] mce: >>> [Hardware Error]: Run the above through 'mcelog --ascii' >>> [10827.067078] mce: [Hardware Error]: CPU 3: Machine Check >>> Exception: 5 Bank 4: b200000000100402 [10827.070986] mce: >>> [Hardware Error]: RIP !INEXACT! 10:<ffffffff81321387> >>> {intel_idle+0xe7/0x180} [10827.074899] mce: [Hardware Error]: TSC >>> 2238c73db43 [10827.078769] mce: [Hardware Error]: PROCESSOR >>> 0:306a9 TIME 1416411506 SOCKET 0 APIC 3 microcode 1b >>> [10827.082673] mce: [Hardware Error]: Run the above through >>> 'mcelog --ascii' [10827.086569] mce: [Hardware Error]: CPU 2: >>> Machine Check Exception: 5 Bank 4: b200000000100402 >>> [10827.090503] mce: [Hardware Error]: RIP !INEXACT! >>> 10:<ffffffff812ab186> {intel_sqrt+0x36/0x50} [10827.094415] mce: >>> [Hardware Error]: TSC 2238c73db28 [10827.098299] mce: [Hardware >>> Error]: PROCESSOR 0:306a9 TIME 1416411506 SOCKET 0 APIC 2 >>> microcode 1b [10827.102242] mce: [Hardware Error]: Run the above >>> through 'mcelog --ascii' [10827.106182] mce: [Hardware Error]: >>> CPU 0: Machine Check Exception: 5 Bank 4: b200000000100402 >>> [10827.110177] mce: [Hardware Error]: RIP !INEXACT! >>> 10:<ffffffff81321387> {intel_idle+0xe7/0x180} [10827.114143] mce: >>> [Hardware Error]: TSC 2238c73db06 [10827.118038] mce: [Hardware >>> Error]: PROCESSOR 0:306a9 TIME 1416411506 SOCKET 0 APIC 0 >>> microcode 1b [10827.122028] mce: [Hardware Error]: Run the above >>> through 'mcelog --ascii' [10827.126037] mce: [Hardware Error]: >>> Machine check: Processor context corrupt [10827.130076] Kernel >>> panic - not syncing: Fatal Machine check [10827.134149] Kernel >>> Offset: 0x0 from 0xffffffff81000000 (relocation range: >>> 0xffffffff80000000-0xffffffff9fffffff) [10827.136647] >>> drm_kms_helper: panic occured, switching back to text console >>> [10827.163009] Rebooting in 30 seconds.. [10857.234707] ACPI >>> MEMORY or I/O RESET_REG. >> >> I am also making this output an attachment. There is a lot of more >> information in this new mce compared to the other one I sent. >> >> Perhaps some of you got some new suggestions. >> >> Meanwhile, I am downgrading back to 3.11.5-1. >> > >To Rasmus, > >Can you run the parts where it says "run the abvoe through mcelog >- --ascii" and post the contents? > >Regards, >Mark >-----BEGIN PGP SIGNATURE----- >Version: GnuPG v2 > >iF4EAREIAAYFAlRtAEYACgkQZ/Z80n6+J/bSDAD/QULX/4mYDEVfTsiXn2p1PBwx >kGcvdIgfTiSwYRMbrz4A/20NYjKeQ6EJPUpdXODgl8kp03CVAVeQknkzxtmZrnlL >=mDQq >-----END PGP SIGNATURE----- I may have had a similar error, but I can't remember the details. Have you checked if your hardware clock is synchronized? It fixed my kernel panic issues. -- vixsomnis