On Sat, 19 May 2007, William L. Maltby wrote:
On Sat, 2007-05-19 at 17:54 +0300, Itay wrote:
[snip]
I tried the padding technique - the media errors were gone;
kernel panic - stayed. I hope that you or others may help me
with this.
Most likely it will be others. My ignorance is boundless and,
fortunately, my ego is inversely proportional to that! :-)
I'm glad the media errors are gone.
Which leaves me with the more difficult alternatives. Arrrgh.
[snip]
3 I tried several things, each one of them ended in *kernel panic*
either before logging in as root, or some minutes after. The panic
appeared after idling the machine for some time.
*sniff* Smells hardware-related. But whether it's bad hardware or kernel
handling of it, I'm too ignorant to hazard a guess. I googled and found
your original post (BTW, don't high-jack threads, even you own. It made
it more difficult to find you brief originally-posted hardware ref). :-O
I thought (and still do) that the two issues were related, and
therefore modifying the subject line and including a [was:...]
clause are sufficient. Sorry for the extra work.
I was going to ask about x586 or C5 processors, but I did manage to find
your OP and saw AMD 4200+, IIRC. So we don't have to worry about that.
:-)
4 A couple of strange things
+ I have found out that the default run level was set to 3.
When, as a root I tried 'telinit 5', the machine responded with a
blank screen. I had to reset.
Have you tried a <CTRL>-<ALT>-<F1> when this happens? Since desktop is
being started on tty7, if it fails and seems blank, maybe switching to
virtual console 1 will work, if the machine is still alive. If so, maybe
some answers are there (view /var/log/messages, the X log, etc.).
Wasn't able to switch to virtual consoles. (I begin to suspect
that there are some problems with the keyboard as well, though.)
No clues in /var/log/messages.
And no X.log at all!
+ Rebooting the machine was accompanied with messages regarding
ntp/clock skew. Later, I have found out that I have lost the
network connection, probably while playing with the
installation, so this probably explains the clock skew.
Am not sure if this has any relevance.
+ At no point I was prompted to setup a non-root user.
IIRC, when I did my C5 install, I got that prompt. If that's normal, it
may mean that the problem actually bit your during the install phase and
not everything got done correctly.
Possibly. But there were no hints for that in install.log and
anaconda.*log*
5 For each crash / kernel panic I got a screen-load of trace and other
cryptic output. Each time, so it seems, the output was different.
*Q* Is there a way to dump those messages into a file?
I'm too ignorant to answer that. But if you do get up and running for a
few minutes in a text console, clues may be laying around
in /var/log/messages. Search backwards for "restart" (twice) or some
other word, like "panic", and read around there.
No hints except for what I have mentioned below.
6 Only suspicious thing I have found in /var/log/messages was lines
like this
May 19 11:27:36 bilbo kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
May 19 11:27:36 bilbo kernel: ata1.00: tag 0 cmd 0xb0 Emask 0x1 stat 0x51 err 0x4 (device error)
May 19 11:27:36 bilbo kernel: ata1: EH complete
7 Also, /var/log/secure had these errors - I believe for every reboot.
...
May 19 11:25:30 bilbo login: ROOT LOGIN ON tty1
May 19 11:26:06 bilbo login: pam_unix(login:session): session closed for user root
May 19 11:26:09 bilbo sshd[2677]: Received signal 15; terminating.
May 19 11:27:26 bilbo sshd[2687]: Server listening on :: port 22.
May 19 11:27:26 bilbo sshd[2687]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
May 19 11:29:51 bilbo login: pam_unix(login:session): session opened for user root by LOGIN(uid=0)
May 19 11:29:51 bilbo login: pam_selinux(login:session): Warning! Could not get new context for /dev/tty1, not relabeling: Invalid argument
May 19 11:29:51 bilbo login: pam_selinux(login:session): usercon=(null), prev_context=system_u:object_r:tty_device_t
May 19 11:29:51 bilbo login: ROOT LOGIN ON tty1
I'm too ignorant to answer authoritatively.
My *guess* is that the application related errors you reported may be a
result of certain installation steps terminating early due to the false
I/O errors reported by the kernel/driver(s).
HTH
--
Bill
Any recommendation how to proceed?
(The most pressing question: is it the hardware? Should I take
the box back to the seller?)
If the panics are random, IIRC, could be memory, could be ... But a good
run of memtest386 from the install CD should help determine that. Also,
it is not uncommon for new hardware to have the occasional loose
connector or PCI card. Maybe too small power supply. Maybe CPU fan not
spinning. Maybe ambient temperature of the room is too high and internal
box temperature excessive.
Running memtest now for the night (runs for 2 hours already).
If it was a question of excess heat I would expect to have
trouble during memtest run as well; no?
If you suspect hardware, check all connectors. Make sure memory, power
supply connectors and PCI cards are firmly seated. Make sure your power
supply is adequate (my EPOX board needed much more than the PS for the
ACER box, into which the EPOX was originally installed, could supply.
Had random panics, usually near startup times, sometimes a few minutes
after. That's natural because the ACER had an integrated SiS chip set
which needs much less power than the Via-based EPOX.
Make sure the CPU fan is seated and working.
Is your AC power from the electric company reliable? Fluctuations of 20%
are not uncommon here. Battery backup with power conditioning helps a
lot.
Actually, the power supply is not stable enough. But there
were no fluctuations that I could notice during my attempts
this morning. We'll keep this in mind, though.
Since you mentioned a delay sometimes (IIRC), heat sounds like a
possible culprit. If the room is cool, take the covers off and see if it
runs longer. If it stays up long enough, do
Again: memtest'ing for few hours should produce a similar
challenge I should think.
I could try running knoppix 5 for a while and straining somehow
the CPU.
# cat /proc/acpi/thermal_zone/THRM/temperature
temperature: 36 C
Make sure it's in the range for the AMD you have. BTW, mine is lower
than it used to be. I added an expensive Zallman FHS a few months back.
May try overclocking someday if I get enough interest.
We'll check tomorrow when attempting to reboot into centos.
Use google with "site:centos.org" added, e.g. like this
screen blanks after initial setup site:centos.org
in advanced search fields (I had site:... in the "all of the words"
field and "screen blanks after initial setup" in the "exact phrase"
field. You'll find lots of instances of kernel panics discussed on the
list and some suggestions, in some cases, for "noapic" and similar boot-
time parameters.
Yup. I noticed that some of them were related to nVidia
hardware. Well, my box has a few nVidia's, so maybe...
Thanks.
--
Itay Furman <centos@xxxxxxxxxxxxxx>
--
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos