Re: Machine hang - how to know what happens?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Shreyansh Jain wrote:
> Dear All,
> 
> I have a Intel desktop machine having P4 processor and 512MB RAM. It
> has a custom 2.6.25 (custom because config has been changed to
> de-select/select unnecessary/required things before compilation)
> running over SLES10 distro.
So, you are running SLES10 with newest vanilla kernel?
> 
> I have noticed that this machine tends to hang after running
> un-interrupted for a certain number of days. There is no fixed pattern
> that happens (no fixed number of days), and hangs might occur as
> frequent as 2-3 days and as delayed as 7 days.
> 
> I have noticed this happening for no apparent reason. This machine is
> being used as a ssh box containing a repository of kernel sources -
> thats it. There is no configured web-server or background application
> running on this.
> 
> Problem:
> 1. The hangs is such that there is nothing on the display and hence I
> cannot see what state the machine is (not that I am expecting that
> would help in such case).
Did you attach a serial console to the host and tried to see some remote
messages (of course you need to add to the kernel command line sth. like
"console=tty0 console=ttyS0,115200").
> 2. There is nothing unsual in /var/log/messages, /var/log/warn,
> /var/log/mcelog ... and many other log files.
If the kernel had the opportunity to flush the buffer.
> 3. There is no crash dump either, even when I have configured
> kexec/kdump on this. It work, becuase I tested it by triggering using
> sysrq.
You did successfull a sysrq, fine, what did it say?
If you used kdump, you triggered a dump. What did the analyse says, what
was the last instruction, the original kernel have been (I think of a
soft-lookup)?
> 4. There are no kernel messages about any failed device or similar
> things in past logs (once I have rebooted).
> 
> Output of /var/log/messages from one of the most recent stall is:
> 
> Question:
> What should be done in such situations? What can be a reliable method
> to know the real reason behing such stalls?
> Any ideas/hints/suggestions are most welcome. I would like to solve
> this mystery rather than live with it.
Like mentioned before, check the hardware. Probably, I only guess, some
background process triggers the hang, perhaps you can do a "ps auxwf"
and send it to serial console or over network.

> Shreyansh
Regards,
-- 
Patrick Kirsch - Quality Assurance Department
SUSE Linux Products GmbH GF: Markus Rex, HRB 16746 (AG Nuernberg)

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ


[Index of Archives]     [Newbies FAQ]     [Linux Kernel Mentors]     [Linux Kernel Development]     [IETF Annouce]     [Git]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux SCSI]     [Linux ACPI]
  Powered by Linux