Peter J. Stieber wrote: > I noticed it a few times before the latest kernel, but now it is more > frequent. 'more frequent' may well be indication of system component breaking down. that is to say that if it is 'bad cap', capacitor is starting to have more internal leakage. other things that can cause 'more frequent' is cooling problems caused by dirty cooling fan blades, fan itself slowing from lubricate drying, dirty heat sink on cpu or other high heat vlsi chip, including some gpu. [graphics processor unit] also, some heat transfer paste will dry within a couple of years and cause a loss of cooling. i have never check, but if there is a command line for checking system temp, fan speed, and voltages, running a cron at 10 to 15 minute intervals will give a good idea of what is happening. > I'm starting to notice a pattern that makes me think I should look at > cron entries. Here is the frequency of reboot from a previous post... if it is a cron running, this could be an increase in cpu usage and a heat increase. therefore, knowing what is happening temp and fan wise would help. > The only recent hardware change was the addition of a Belkin OmniView > PRO2 4-Port KVM switch (F1DA104T). i have same kvm and have not had any problems with it. also, there would have to be something very weird going on with it to cause a problem. something like a shorting that would cause a drop in voltage. > The top command indicates ld was running. This was the case for 3 other > reboots (see my prior posts)... 'ld' could be a cause as it would be a cpu load. therefore you would need to look for other systems loads. and again, knowing what is happening via 'sensors' will show just how much load you are getting. > Result of last | grep crash > pstieber pts/2 172.16.1.16 Tue Apr 7 19:27 - crash (06:15) > pstieber tty1 Tue Apr 7 13:15 - crash (12:27) > pstieber pts/0 192.168.120.51 Tue Apr 7 06:55 - crash (00:13) <snip> > root pts/0 mrburns.toyon.co Tue Mar 24 08:38 - crash (00:04) > root pts/0 192.168.120.51 Mon Mar 23 06:33 - crash (00:02) knowing what is going on just before these periods, crons, etc, would help you find a system load. > PSU: ANTEC TRU550EPS12V ATX "bad cap" AND "antec" [with "" and 'AND'] results in 308 hits on google http://www.google.com/search?hl=en&as_q=%22bad+cap%22+AND+%22antec%22&as_epq=&as_oq=&as_eq=&num=10&lr=&as_filetype=&ft=i&as_sitesearch=&as_qdr=all&as_rights=&as_occt=any&cr=&as_nlo=&as_nhi=&safe=images > MB: Thunder K8W (S2885ANRF) "bad cap" AND "Thunder K8W" results in 6 hits. http://www.google.com/search?hl=en&lr=&as_qdr=all&q=%22bad+cap%22+AND+%22Thunder+K8W%22&btnG=Search not a very good combination. :( > Thanks for the ideas. you are welcome. > This machine and the attached cluster is used by > a group of 10 or so at my company. It's difficult to do a lot of > tinkering, but I can use the argument that if it reboots, what good is it. that should be justification for a complete new system. justification is fact of which is more costly, system crashing and chance of data loss, and your time, or cost of a new box. if they go for a new box, be sure that you have a good safety margin on rating of power supply. max load at 70% of max output would be nice. -- peace out. tc,hago. g . **** in a free world without fences, who needs gates. ** help microsoft stamp out piracy - give linux to a friend today ** to mess up a linux box, you need to work at it; to mess up an ms windows box, you just need to *look at* it. ** learn linux: 'Rute User's Tutorial and Exposition' http://rute.2038bug.com/index.html 'The Linux Documentation Project' http://www.tldp.org/ 'LDP HOWTO-index' http://www.tldp.org/HOWTO/HOWTO-INDEX/index.html 'HowtoForge' http://howtoforge.com/ 'fedora faqs' http://www.fedorafaq.org/ ****
Attachment:
signature.asc
Description: OpenPGP digital signature
-- fedora-list mailing list fedora-list@xxxxxxxxxx To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines