Hi Roger, Ok, thanks, I've now managed to boot into an Ubuntu 18 server environment from a USB that has all the raid personalities loaded up. /proc/mdstat shows the same thing the systemrescuecd was showing: i.e. md126 is raid1, and md127 is inactive with all disks being spares. Just somehow not recognizing it as a raid6... thoughts from here? thanks, allie On 3/31/2020 3:43 PM, Roger Heflin wrote: > Yes, you would have to activate it. Since raid456 was not loaded > when the udev triggers happened at device creation then it would have > failed to be able to assemble it. > > Do this: "lsinitrd /yourrealboot/initr*-younormalbootkernel | grep -i > raid456" if that returns nothing then that module is not in the initrd > and that would produce a failure to find the rootfs when the rootfs is > on a raid4/5/6 device. > > You probably need to look at /etc/dracut.conf and/or > /etc/dracut.conf.d and make sure mdraid modules is being installed, > and rebuild the initrd, after rebuilding it then rerun the above test, > if it does not show raid456 then you will need to add explicit options > to include that specific module. > > There should be instructions on how to rebuild an initrd from a livecd > boot, I have a pretty messy way to do it but my way may not be > necessary when livecd is very similar to boot os. Most of the ones I > rebuild it, the livecd is much newer than the actual host os, so to > get a clean boot you have to mount the system at say /mnt (and any > others if you separate fs on root) and boot at /mnt/boot and do a few > bind mounts to get /proc /sys /dev visable under /mnt and chroot /mnt > and run the commands from the install to rebuild init and use the > config from the actual install. > > On Tue, Mar 31, 2020 at 9:28 AM Alexander Shenkin <al@xxxxxxxxxxx> wrote: >> >> Thanks Roger, >> >> modprobe raid456 did the trick. md126 is still showing up as inactive >> though. Do I need to bring it online after I activate the raid456 module? >> >> I could copy the results of /proc/cmdline over here if still necessary, >> but I figure it's likely not now that we've found raid456... It's just >> a single line specifying the BOOT_IMAGE... >> >> thanks, >> allie >> >> On 3/31/2020 2:53 PM, Roger Heflin wrote: >>> the fedora live cds I think used to have it. It could be build into >>> the kernel or it could be loaded as a module. >>> >>> See if there is a config* file on /boot and if so do a "grep -i >>> raid456 configfilename" if it is =y it is build into the kernel, if >>> =m it is a module and you should see it in lsmod so if you don't the >>> module is not loaded, but it was built as a module. >>> >>> if=m then Try "modprobe raid456" that should load it if it is on the livecd. >>> >>> if that fails do a find /lib/modules -name "raid456*" -ls and see if >>> it exists in the modules directory. >>> >>> If it is built into the kernel =y then something is probably wrong >>> with the udev rules not triggering and building and enabling the raid6 >>> array on the livecd. THere is a reasonable chance that whatever this >>> is is also the problem with your booting os as it would need the right >>> parts in the initramfs. >>> >>> What does cat /proc/cmdline look like? There are some options on >>> there that can cause md's to get ignored at boot time. >>> >>> >>> >>> On Tue, Mar 31, 2020 at 5:08 AM Alexander Shenkin <al@xxxxxxxxxxx> wrote: >>>> >>>> Thanks Roger, >>>> >>>> It seems only the Raid1 module is loaded. I didn't find a >>>> straightforward way to get that module loaded... any suggestions? Or, >>>> will I have to find another livecd that contains raid456? >>>> >>>> Thanks, >>>> Allie >>>> >>>> On 3/30/2020 9:45 PM, Roger Heflin wrote: >>>>> They all seem to be there, all seem to report all 7 disks active, so >>>>> it does not appear to be degraded. All event counters are the same. >>>>> Something has to be causing them to not be scanned and assembled at >>>>> all. >>>>> >>>>> Is the rescue disk a similar OS to what you have installed? If it is >>>>> you might try a random say fedora livecd and see if it acts any >>>>> different. >>>>> >>>>> what does fdisk -l /dev/sda look like? >>>>> >>>>> Is the raid456 module loaded (lsmod | grep raid)? >>>>> >>>>> what does cat /proc/cmdline look like? >>>>> >>>>> you might also run this: >>>>> file -s /dev/sd*3 >>>>> But I think it is going to show us the same thing as what the mdadm >>>>> --examine is reporting. >>>>> >>>>> On Mon, Mar 30, 2020 at 3:05 PM Alexander Shenkin <al@xxxxxxxxxxx> wrote: >>>>>> >>>>>> See attached. I should mention that the last drive i added is on a new >>>>>> controller that is separate from the other drives, but seemed to work >>>>>> fine for a bit, so kinda doubt that's the issue... >>>>>> >>>>>> thanks, >>>>>> >>>>>> allie >>>>>> >>>>>> On 3/30/2020 6:21 PM, Roger Heflin wrote: >>>>>>> do this against each partition that had it: >>>>>>> >>>>>>> mdadm --examine /dev/sd*** >>>>>>> >>>>>>> It seems like it is not seeing it as a md-raid. >>>>>>> >>>>>>> On Mon, Mar 30, 2020 at 11:13 AM Alexander Shenkin <al@xxxxxxxxxxx> wrote: >>>>>>>> Thanks Roger, >>>>>>>> >>>>>>>> The only line that isn't commented out in /etc/mdadm.conf is "DEVICE >>>>>>>> partitions"... >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Allie >>>>>>>> >>>>>>>> On 3/30/2020 4:53 PM, Roger Heflin wrote: >>>>>>>>> That seems really odd. Is the raid456 module loaded? >>>>>>>>> >>>>>>>>> On mine I see messages like this for each disk it scanned and >>>>>>>>> considered as maybe possibly being an array member. >>>>>>>>> kernel: [ 83.468700] md/raid:md13: device sdi3 operational as raid disk 5 >>>>>>>>> and messages like this: >>>>>>>>> md/raid:md14: not clean -- starting background reconstruction >>>>>>>>> >>>>>>>>> You might look at /etc/mdadm.conf on the rescue cd and see if it has a >>>>>>>>> DEVICE line that limits what is being scanned. >>>>>>>>> >>>>>>>>> On Mon, Mar 30, 2020 at 10:13 AM Alexander Shenkin <al@xxxxxxxxxxx> wrote: >>>>>>>>>> Thanks Roger, >>>>>>>>>> >>>>>>>>>> that grep just returns the detection of the raid1 (md127). See dmesg >>>>>>>>>> and mdadm --detail results attached. >>>>>>>>>> >>>>>>>>>> Many thanks, >>>>>>>>>> allie >>>>>>>>>> >>>>>>>>>> On 3/28/2020 1:36 PM, Roger Heflin wrote: >>>>>>>>>>> Try this grep: >>>>>>>>>>> dmesg | grep "md/raid", if that returns nothing if you can just send >>>>>>>>>>> the entire dmesg. >>>>>>>>>>> >>>>>>>>>>> On Sat, Mar 28, 2020 at 2:47 AM Alexander Shenkin <al@xxxxxxxxxxx> wrote: >>>>>>>>>>>> Thanks Roger. dmesg has nothing in it referring to md126 or md127.... >>>>>>>>>>>> any other thoughts on how to investigate? >>>>>>>>>>>> >>>>>>>>>>>> thanks, >>>>>>>>>>>> allie >>>>>>>>>>>> >>>>>>>>>>>> On 3/27/2020 3:55 PM, Roger Heflin wrote: >>>>>>>>>>>>> A non-assembled array always reports raid1. >>>>>>>>>>>>> >>>>>>>>>>>>> I would run "dmesg | grep md126" to start with and see what it reports it saw. >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Mar 27, 2020 at 10:29 AM Alexander Shenkin <al@xxxxxxxxxxx> wrote: >>>>>>>>>>>>>> Thanks Wol, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Booting in SystemRescueCD and looking in /proc/mdstat, two arrays are >>>>>>>>>>>>>> reported. The first (md126) in reported as inactive with all 7 disks >>>>>>>>>>>>>> listed as spares. The second (md127) is reported as active >>>>>>>>>>>>>> auto-read-only with all 7 disks operational. Also, the only >>>>>>>>>>>>>> "personality" reported is Raid1. I could go ahead with your suggestion >>>>>>>>>>>>>> of mdadm --stop array and then mdadm --assemble, but I thought the >>>>>>>>>>>>>> reporting of just the Raid1 personality was a bit strange, so wanted to >>>>>>>>>>>>>> check in before doing that... >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Allie >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 3/26/2020 10:00 PM, antlists wrote: >>>>>>>>>>>>>>> On 26/03/2020 17:07, Alexander Shenkin wrote: >>>>>>>>>>>>>>>> I surely need to boot with a rescue disk of some sort, but from there, >>>>>>>>>>>>>>>> I'm not sure exactly when I should do. Any suggestions are very welcome! >>>>>>>>>>>>>>> Okay. Find a liveCD that supports raid (hopefully something like >>>>>>>>>>>>>>> SystemRescueCD). Make sure it has a very recent kernel and the latest >>>>>>>>>>>>>>> mdadm. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> All being well, the resync will restart, and when it's finished your >>>>>>>>>>>>>>> system will be fine. If it doesn't restart on its own, do an "mdadm >>>>>>>>>>>>>>> --stop array", followed by an "mdadm --assemble" >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> If that doesn't work, then >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>> Wol