Shreyansh Jain wrote: > Dear All, > > I have a Intel desktop machine having P4 processor and 512MB RAM. It > has a custom 2.6.25 (custom because config has been changed to > de-select/select unnecessary/required things before compilation) > running over SLES10 distro. So, you are running SLES10 with newest vanilla kernel? > > I have noticed that this machine tends to hang after running > un-interrupted for a certain number of days. There is no fixed pattern > that happens (no fixed number of days), and hangs might occur as > frequent as 2-3 days and as delayed as 7 days. > > I have noticed this happening for no apparent reason. This machine is > being used as a ssh box containing a repository of kernel sources - > thats it. There is no configured web-server or background application > running on this. > > Problem: > 1. The hangs is such that there is nothing on the display and hence I > cannot see what state the machine is (not that I am expecting that > would help in such case). Did you attach a serial console to the host and tried to see some remote messages (of course you need to add to the kernel command line sth. like "console=tty0 console=ttyS0,115200"). > 2. There is nothing unsual in /var/log/messages, /var/log/warn, > /var/log/mcelog ... and many other log files. If the kernel had the opportunity to flush the buffer. > 3. There is no crash dump either, even when I have configured > kexec/kdump on this. It work, becuase I tested it by triggering using > sysrq. You did successfull a sysrq, fine, what did it say? If you used kdump, you triggered a dump. What did the analyse says, what was the last instruction, the original kernel have been (I think of a soft-lookup)? > 4. There are no kernel messages about any failed device or similar > things in past logs (once I have rebooted). > > Output of /var/log/messages from one of the most recent stall is: > > Question: > What should be done in such situations? What can be a reliable method > to know the real reason behing such stalls? > Any ideas/hints/suggestions are most welcome. I would like to solve > this mystery rather than live with it. Like mentioned before, check the hardware. Probably, I only guess, some background process triggers the hang, perhaps you can do a "ps auxwf" and send it to serial console or over network. > Shreyansh Regards, -- Patrick Kirsch - Quality Assurance Department SUSE Linux Products GmbH GF: Markus Rex, HRB 16746 (AG Nuernberg) -- To unsubscribe from this list: send an email with "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx Please read the FAQ at http://kernelnewbies.org/FAQ