Hello Theodore Ts'o, Thank you for your points & I take your comments about being professional ...... we DO have a spare machine all setup & ready to go, the fly in this ointment is the secure server certificate, under which several users are running their small shops ...... otherwise I'd ne thinking of moving the server & starting again. I am forwarding this to our techies & will see. Regards, Nico Morrison nico.morrison@micronicos.com ___________________________________________ Micronicos Limited - London, UK. Tel: +44 20 8870 8849 Fax: +44 20 8870 5290 ___________________________________________ From: Theodore Ts'o [mailto:tytso@mit.edu] Sent: 06 February 2003 14:25 To: Nico Morrison Cc: 'Juri Haberland'; ext3 users list Subject: Re: Why does old kernel boot when new kernel installed? On Thu, Feb 06, 2003 at 01:30:15PM -0000, Nico Morrison wrote: > [root@ns5 boot]# df -k > Filesystem 1k-blocks Used Available Use% Mounted on > /dev/md0 36463784 5642076 28969420 17% / > none 510400 0 510400 0% /dev/shm > > Where /boot is ALSO on the RAID1 partition ( this must have been a mistake > at setup time ..... although the machine works fine apart from a LOT of > kjournald activity (up to 60% CPU!).) > > Could this be causing GRUB not to see the other kernels & if so what can we > do? Um, that would be yes, very likely. The big question at this point is how GRUB was actually configured at installation time. It is either using a "preset-menu" embedded into it at install time (which it uses if it cannot find the configuration file), or the configuration file, depending on where it was defined to be when GRUB was installed, is somewhere else. If you are right in assuming that the configuration file on all of your machines are otherwise identical, and your Linux/Unix "professionals" didn't perform other improvisations when they installed that particular server, then creating a /boot filesystem on /dev/hda1 like the other systems, and populating it with the appropriate files, and then rebooting, *may* fix the problem for you. Or if you're really lucky, /boot already exists in /dev/hda1, but it wasn't mounted, and once you mount it, you can re-install the newer kernel, and update the /boot/grub/menu.lst found in /dev/hda1's filesystem, and you're good to go. However, a good system administrator, over the years, becomes a paranoid s.o.b. Fortunately, the worst case in performing this particular test would be a reboot; creating or modifying the /boot partition in /dev/hda1, will, in the worst case, simply result in it being ignored by grub. If that doesn't work, however, the next thing to recommend would be to reinstall grub, or if at this point your faith that the system was properly installed, and you are concerned that there may be some other deviances between the "as designed" and "as built" of your server, would be to save the data disks, and rebuild and reconfigure your server from scratch. > This is a busy public server with several 100 users ......... we > have to be very careful doing anything. > >Our tech support are Linux/UNIX professionals & are baffled - I am hoping >for some help here, I am emailing as they don't have the time, look after >over 100 servers, we only run 12 so I try to dig .... As professionals, especially if they are maintaining a large scale site with as many machines as you mentioned, I'm sure they designed and implemented installation scripts so that server machines are easily replicable, and can be rebuilt on a moment's notice. So rebuilding the system software on your server machine should be something that should be doable very easily. Better yet, they should be able to have spare machines on which you can rebuild the system software from scratch, and where you can test to make sure the machine boots correctly, etc., and then afterwards, you can schedule downtime, pull the data disks from suspect server, and then install them in the replacement server, and restore service with very minimal downtime. What, you say you aren't using separate disks and filesystems to separate the system software from the user/application data? And you don't have turnkey scripts that allow you to rebuild the system software of your servers in a repeatable and less error-prone fashion? You *did* say you had professionals in your employ, right? :-) Seriously, there are some really basic, fundamental principles of sound, large-scale system administration that are not being followed, and the fact that you are using a single gigantic root partition and are co-mingling system and user data is just one sympom of the fact that very likely your system administrators are breaking a good number of these fundamentals. The one good thing about the current state of the economy is there are a lot of really good, experienced system administrators who can understand how to design systems that are robust and which can be easily serviced and maintained. I would seriously suggest that you consider bringing one of them on board as a member of your team. - Ted _______________________________________________ Ext3-users@redhat.com https://listman.redhat.com/mailman/listinfo/ext3-users