On Sun, Jul 29, 2012 at 10:02 AM, Sam Varshavchik <mrsam@xxxxxxxxxxxxxxx> wrote: > To: For users of Fedora Core releases <users@xxxxxxxxxxxxxxxxxxxxxxx> > Subject: Check your /etc/default/grub, if you use raid 1. > Message-ID: <cone.1343570520.888812.3982.1000@xxxxxxxxxxxxxxxxxxxxxx> > Content-Type: text/plain; charset="utf-8"; Format="flowed"; > DelSp="yes" > > There's a long standing combination of two bugs: the list of rd.md.uuid boot > parameters generated by anaconda for /etc/default/grub may not include the > raid uuid of non-stock partitions like /home; and although the ramfs > initscript autodiscovers all raid volumes present, sometimes (not always, > I'll estimate 5% of the time) if a uuid is not enumerated in the boot > parameters, one of the drives in the raid 1 volume may not get assembled at > boot. > > There's probably a third bug in here: mdmonitor should've mailed me when an > array came up degraded at boot (I suspect that because mdmonitor gets > started so early in the boot process, not all the moving pieces are there > for mail delivery to happen). Eventually, you'll boot again with both drives > in the array somehow, except they'll be out of sync, resulting in massive > corruption. If you're lucky, you'll boot just with the other drive, and > discover that your filesystem's contents are weeks/months out of date, and > maybe you'll be lucky enough to figure out what happen, and switch back to > the other drive and resync. But, not everyone's so lucky. > > This first started happening in F16. It took me a while to figure out the > cause for an occasionally raid assembly failure at boot. Fixed it, and > forgot about it. Well, looks like the F17 anaconda brought back the broken > /etc/default/grub, which found its way into my grub.cfg, and I just lost a > full day, cleaning up this mess. > > So, if you use raid 1 and upgraded to F17, you may need to fix this before > it's too late: put back the missing uuid into /etc/default/grub, and into > every entry in grub.cfg > > Pissed. Thanks for the explanation and fix/workaround, Sam. This happened to me as well. I ran fsck on the two mirrors independently and was able to recover most of the data from the lost+found's. But I had been brooding over the root cause until now. -- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines Have a question? Ask away: http://ask.fedoraproject.org