Shreyansh Jain escreveu:
Dear All,
I have a Intel desktop machine having P4 processor and 512MB RAM. It
has a custom 2.6.25 (custom because config has been changed to
de-select/select unnecessary/required things before compilation)
running over SLES10 distro.
I have noticed that this machine tends to hang after running
un-interrupted for a certain number of days. There is no fixed pattern
that happens (no fixed number of days), and hangs might occur as
frequent as 2-3 days and as delayed as 7 days.
I have noticed this happening for no apparent reason. This machine is
being used as a ssh box containing a repository of kernel sources -
thats it. There is no configured web-server or background application
running on this.
Problem:
1. The hangs is such that there is nothing on the display and hence I
cannot see what state the machine is (not that I am expecting that
would help in such case).
2. There is nothing unsual in /var/log/messages, /var/log/warn,
/var/log/mcelog ... and many other log files.
3. There is no crash dump either, even when I have configured
kexec/kdump on this. It work, becuase I tested it by triggering using
sysrq.
4. There are no kernel messages about any failed device or similar
things in past logs (once I have rebooted).
Output of /var/log/messages from one of the most recent stall is:
---8<----
Jun 9 04:25:35 DogMatix syslog-ng[3516]: STATS: dropped 0
Jun 9 05:25:36 DogMatix syslog-ng[3516]: STATS: dropped 0
Jun 9 06:25:36 DogMatix syslog-ng[3516]: STATS: dropped 0
Jun 9 07:25:36 DogMatix syslog-ng[3516]: STATS: dropped 0
Jun 9 08:25:36 DogMatix syslog-ng[3516]: STATS: dropped 0
Jun 9 09:25:36 DogMatix syslog-ng[3516]: STATS: dropped 0
Jun 9 12:44:12 DogMatix syslog-ng[2732]: syslog-ng version 1.6.8 starting
----8<----
Notice that syslog is printing something each hour, and then there is
stall after 09:25. Last line is bootup message after hard-booting the
machine.
Dogmatix is the name of the machine.
Question:
What should be done in such situations? What can be a reliable method
to know the real reason behing such stalls?
Any ideas/hints/suggestions are most welcome. I would like to solve
this mystery rather than live with it.
--
Shreyansh
--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ
You must check the main reasons for this:
a) run memtest;
b) check the processor, disks and any other temperature available;
c) check for bad pci devices like network cards. This is the hard part
IMO because there's no log.
In a nutshell, you must eliminate every hardware piece, one by one.
Then, you'll find the culprit.
--
--
Best Regards
Alan Menegotto
--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ