Re: linux 3.1-4 - two i686 lockups after ~ 5 hours of operations. two x86_64 seem OK

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



On 11/10/2011 12:56 PM, David J. Haines wrote:
On Thu, Nov 10, 2011 at 1:44 PM, Richard Schütz<r.schtz@xxxxxxxxxxx>  wrote:
Am 10.11.2011 18:47, schrieb David C. Rankin:

tpowa,

Upgraded 5 i686 boxes and 2 x86_64 boxes to linux 3.1-4 yesterday night.
This morning, one i686 server is dead, other i686 box responded to xterm
(return input) and then locked (ssh connection was left up after login
to confirm reboot). Two other i686 boxes (under no load) still running.
The boxes are remote. I'll pull the logs when I get to the site and
send. Anybody else seeing this with linux 3.1-4?


I had lockups on my notebook [1] and netbook [2] during normal usage. Both
have a Intel processor. The AMD based desktop machine had no problems so
far. All systems are running linux 3.1-4 x86_64.

[1] http://pastebin.com/VAnTLKtP
[2] http://pastebin.com/64QKSJTN

--
Regards,
Richard Schütz


I'm getting lockups on an i5 box with Intel graphics running x86_64
while I'm using it. This has been happening while I've been using the
computer and has been happening since 3.0.7-1. 3.0.6-2, however,
seemed perfectly fine.

David J. Haines
dhaines@xxxxxxxxx


  Hmm.. Absolutely no help from the logs on the box that locked:

Nov 10 03:20:04 phoenix -- MARK --
Nov 10 03:25:34 phoenix dhcpd: DHCPREQUEST for 192.168.7.124 from 00:11:43:22:50:08 via eth0 Nov 10 03:25:34 phoenix dhcpd: DHCPACK on 192.168.7.124 to 00:11:43:22:50:08 via eth0
Nov 10 12:44:33 phoenix kernel: [    0.000000] Initializing cgroup subsys cpuset
Nov 10 12:44:33 phoenix kernel: [    0.000000] Initializing cgroup subsys cpu

Obviously something occurred after 03:25:34, but no indication of what. The second box I lost and thought was locked, wasn't locked, I just had the uncanny coincidence of trying it during one of its spontaneous reboots due to hwclock drift (I'll create a cron job to update this). The boxes are on the same LAN subnet. The only SWAG I have is that once the box with the drifting clock got far enough out of time any net communications with the box that locked may have caused it to panic over the time sync issue.

(but that is wrong because once running, the sysclock is the only clock that matters - right? But that can't be all wrong, otherwise there is no explanation for the spontaneous reboot due to clock drift. A digital paradox so to speak :)

Richard, David - check your hardware clock "# hwclock -r" and compare that to the time returned by "# date". If they are hours apart, then make sure your sysclock is correct and set the hardware clock to your sysclock with "# hwclock -w". Worth checking regardless. I know this used to be done on boot or shutdown and I don't know why it isn't anymore. I'll do some more digging.

--
David C. Rankin, J.D.,P.E.


[Index of Archives]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]
  Powered by Linux