Server fails to boot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



First some history. This is an Intel MB and processor some 6 years old, initially running CentOS 6. It has 4 x 1TB sata drives set up in two mdraid 1 mirrors. It has performed really well in a rural setting with frequent power cuts which the UPS has dealt with and auto shuts down the server after a few minutes and then auto restarts when power is restored.

The clients needed a Windoze server for a proprietary accounting package they use, thus I have recently installed two SSD drives (500GB each) also in a raid 1 mirror and installed CentOS 7 as the host and also VirtualBox running Windoze 10. The hard drives continue to hold their data files.

This appeared to work just fine until a few days ago. After a power cut the server would not reboot.

It takes a while to get in front of the machine, add a monitor, keyboard and mouse only to find:

Warning: /dev/disk/by-id/md-uuid-xxxxxxxx:xxxxxxxx:xxxxxxxx:xxxxxxxx does not exist

repeated three times - one for each of the /, /boot, and swap raid member sets along with a

Warning: /dev/disk/by-uuid/xxxxxxxx:xxxxxxxx:xxxxxxxx:xxxxxxxx does not exist

for the /dev/md125 which is the actual raid 1 / device.

The system is in a root shell of some sort as it has not made the transition from initramfs to the mdraid root drive.

there are some other lines of info and a txt file with hundreds of lines of boot info, ending with the above info (as I recall).

I tried a reboot - same result, rebooted and tried an earlier kernel - same result, tried a reboot to the recovery kernel and all went well. System comes up, all raids sets are up and in sync - no errors.

So, no apparent H/W issues, no mdraid issues apparently, but none of the regular kernels will now boot.

a blkid shows all the expected mdraid devices with the uuids from the error message all in place as expected.

I did a yum reinstall of the most recent kernel as I thought that may repair any /boot file system problems - particularly initramfs, but no difference, will not boot, same exact error messages.

Thus I now have it running on the recovery kernel, with all the required server functions being performed, albeit on an out of date kernel.

Google has one solved problem similar to mine but the solution was change the BIOS from AHCI to IDE - that does not seem correct as I have not changed BIOS, although I have not checked it at this time.

Another solution talks about a race condition and the md raid not being ready when required during the boot process and thus to add delay in the kernel boot line in grub2. Although no one indicated this actually worked.

Another proposed solution is to mount the failed devices from a recovery boot and rebuild initramfs. Before I do this I would like to ask those that know a little more about the boot process, what is going wrong? I can believe the most recent initramfs being a problem, but all three other kernels too?? Yet the recovery kernel works just fine.

As the system is remote, I would like some understanding of what's up before I do any changes - if a reboot occurs and fails, it will mean another trip.

Oh, one other thing, it seems the UPS is not working correctly, thus it may not have shut down cleanly. Working to replace batteries in the UPS.

TIA for your insight.



_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
https://lists.centos.org/mailman/listinfo/centos



[Index of Archives]     [CentOS]     [CentOS Announce]     [CentOS Development]     [CentOS ARM Devel]     [CentOS Docs]     [CentOS Virtualization]     [Carrier Grade Linux]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [Xorg]     [Linux USB]


  Powered by Linux