On June 15, 2004 01:13 pm, Chris W. Parker wrote: > hello. > > i've been having some very strange problems with a server of mine > lately. i'm not sure if it's a hardware or software problem. > > the first time i had a problem was last week. i tried to login to the > server via ssh but nothing happened. after some investigation i was able > to get the machine to boot again using the 'no-hlt' option. i.e. 'linux > no-hlt'. > > so it stays up for a few days* and i try to login to it again today and > i am again presented with a dead-in-the-water server. this server has 4 > hd's with an adaptec 39160 scsi controller and is using software raid. > this time things do not look promising. :( > > once i rebooted the system it stopped at the very beginning of loading > the os. the part *right* after it decompresses the kernel. i tried > booting with my boot disk (that i made when i installed the system) and > it got as far as "decompressing vmlinuz..........." and then stopped. > > i then attempted reboot after reboot trying different things and it > seemed to slowly get worse. at one point it said the CPU had changed and > that i need to go into CMOS and detect it and then save on exit. well it > wouldn't even go into CMOS. > > so at first i thought it might be a software thing, but then all this > crazy stuff leads me to believe it's a hardware thing, but i have no > idea what. oh btw, there are four (or maybe there are only three) small, > flat, square leds right next to the pci slot where the scsi controller > is and they are all red. whereas in the past i've seen them > green/yellow. > > i know this is pretty vague but i really dont know how to troubleshoot > something like this so i'm hoping that with some extra brains i'll be > able to find a solution. > > the most important thing is that i get the data off the harddrives. > > > > thanks, > chris. Hi Chris, it definately sounds like hardware. What is the computer (486, PIII...)? How old is the board? I would be suspicious of the board and maybe the scsi controller. It would be great if you had a similar known-good system (or close) to test things on. You could test the scsi controller and disks on another box, that way you know definitively if that is or is not the problem. What happens if you boot the cd in rescue mode and try and access the existing system? You could strip the board down to minimum of devices (just kb, video, ram, cpu). If it bootup (to a 'no drive' kind of error), add things one at time. It is actually easier to detect if the problem is pretty reproducable, otherwise you may need to repeat or extend the tests at each level. Basic rules for troubleshooting: - always try and take a suspect part and place it in a known good environment (except power supplies!). - Don't run a test unless it will tell you catagoriclly that a part passes or fails. For example if you cannot boot from a floppy and you test a second floppy disk. If it fails, you still can't say the original disk is good, but if you boot a different box with that disk it tells you the disk is definately good. Hope that helps. -- Pete Nesbitt, rhce -- redhat-list mailing list unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list