Re: system update killed /boot RAID-1 array auto-assembly/mount. why?

Jesse Wheeler <jwwstpete@xxxxxxxxx> · Mon, 2 Nov 2009 16:25:03 -0500

Ben:

I've observed some similar behavior on RAID-5 w/ SuSE Enterprise
Server 10.2 SP2 with our eDirectory boxes.  While this is different
from my current issue that I posted today, I will say that I have had
-- on average -- better luck with with RHEL based distributions than
SuSE variants.

I've noticed that Novell tends to make just ever so slight changes to
the back-ports that they place into their 'Enterprise' kernel.  That
being said, Novell is also the only current Enterprise grade Linux
vendor to actually place 'sane' options into the default
partitioning/mounting scheme for Ext3 -- i.e.,
'barrier=1,data=journal,noatime'.  However, this doesn't apply to
software raid/linux MD schemes since (AFAIK), only linear/single drive
Ext3 partitions can mount w/ 'barrier=1'.

This has been my hell for the last month.. 'Enterprise Linux'
distributions, filesystems, software vs. hardware RAID, and data loss.
 I've lost many-an-hour sleep ;o)!

--
Jesse W. Wheeler
Member-Owner
Devotio Consulting, L.L.C.
--

On Mon, Nov 2, 2009 at 3:24 PM, Ben DJ
<bendj095124367913213465@xxxxxxxxx> wrote:
> Hi,
>
> I've installed OpenSuse 11.2 RC2 to
>
>  /boot on  4-disk RAID-1, super=1.0
>  /root  & (etc) on 4-disk RAID-10,f2 chunk=256, super=1.1
>
> It's been running fine.
>
> After a recent system upgrade via 'zypper dup', which completed
> without any apparent error, reboot failed.  /boot @ /dev/md0 was not
> mounting, and the RAID-1 wasn't even assembling.
>
> A full day of reading, and trying various repair-the-array solutions
> couldn't get me back.
>
> Although I was able to manually mount the array, and it fsck'ed ok, I
> couldn't get the array to auto-assemble.
>
> Finally, I deleted the array, repartitioned the drive, reinstalled
> kernel, grub and mdadm, and I'm back in business.  At the moment, the
> RAID-10 array is resyncing (not sure why):
>
> cat /proc/mdstat
>  Personalities : [raid10] [raid0] [raid1] [raid6] [raid5] [raid4] [linear]
>  md0 : active raid1 sda1[0] sdd1[3] sdc1[2] sdb1[1]
>       160604 blocks super 1.0 [4/4] [UUUU]
>
>  md1 : active raid10 sda2[0] sdd2[3] sdc2[2] sdb2[1]
>       1953198080 blocks super 1.1 256K chunks 2 far-copies [4/4] [UUUU]
>       [====>................]  resync = 23.8% (465007616/1953198080)
> finish=214.8min speed=115452K/sec
>
>  unused devices: <none>
>
>
> _Something_ happened at that system upgrade.  My mistake for not
> paying closer attention to what was going on.  My goal is to not let
> that happen again.
>
> Knowing that I'm not providing any helpful detail -- I don't have it
> atm -- can anyone speculate as to what might have happened @ the sys
> update to cause this?  If possible, I'd like to start with some clue
> as to what I'm watching out for.
>
> Thanks,
>
> BenDJ
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html