Re: Soft RAID and EFI systems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/02/14 10:06, Francis Moreau wrote:
> On 02/04/2014 09:57 AM, David Brown wrote:
>> On 04/02/14 09:32, Francis Moreau wrote:
>>> On 02/02/2014 11:30 PM, Chris Murphy wrote:
>>>>
>>>> On Feb 2, 2014, at 2:34 PM, Francis Moreau <francis.moro@xxxxxxxxx>
>>>> wrote:
>>>>>
>>>>> That's funny because one of the reasons I want to use UEFI
>>>>> firmware is to get rid of grub (I don't like it and the way it
>>>>> has become such a bloated beast): since /boot is vfat and has its
>>>>> own partition, I prefer use a much simpler bootloader such as
>>>>> gummyboot.
>>>>
>>>> It might be possible to do what you want with mdadm metadata
>>>> version 1.0. Typically bootable raid1 is ext4 on md raid1 using
>>>> metadata format 1.0, and an internal bitmap. When the partitions
>>>> are not assembled, they each appear as separate ext4 partitions. If
>>>> FAT32 on md raid1 with metadata 1.0 still looks like FAT32 as a
>>>> separate partition, and the mdadm v1.0 metadata at the end of the
>>>> partition doesn't confuse the firmware, what should happen is any
>>>> ESP can boot the system. Once the kernel and initramfs are loaded,
>>>> mdadm will locate the mdadm metadata on each partition and assemble
>>>> them into a single md device, and fstab mounts the md device at
>>>> /boot. So prior to boot they are separate ESPs, and after boot it's
>>>> a single ESP (mirrored). But I haven't tested this arrangement with
>>>> ESPs and UEFI.
>>>
>>> I'll test this configuration and see if it works soon.
>>>
>>>>
>>>> The easiest scenario I've found for resilient boot on EFI systems
>>>> is, well, not easy. First, I put shim and grub package files onto
>>>> each ESP along with the previously posted grub.cfg snippet. Those
>>>> grub.cfgs are one time, non-updatable files, that point to
>>>> /boot/grub2/grub.cfg (produced with grub2-mkconfig on Fedora) on
>>>> Btrfs raid1. That's about as reliable as it gets because the only
>>>> dependencies are grub (which understands Btrfs multiple devices)
>>>> and dracut baking the btrfs module into initramfs. It gets
>>>> essentially fool proof if btrfs is compiled into the kernel. Other
>>>> combinations are easier to break. I basically want ESPs that aren't
>>>> being modified if at all avoidable because FAT32 breaks easily if
>>>> anything is being written to it and there is a crash or power
>>>> failure.
>>>>
>>>
>>> I agree that FAT32 can break during power failure, that's the reason
>>> why I'm trying to make it mirrored. But I want to get rid of grub as
>>> much as possible so I would prefer to use the first solution.
>>
>> Mirroring will not help FAT32 during power failure - you have a good
>> chance of getting two copies of the same error.  And if your power fail
>> hits during writes, you also have a good chance of the two disks having
>> /different/ errors and inconsistencies.  The problem lies in FAT32
>> having no log, and no barriers or ordering when it makes changes -
>> updates to the file data, the directory structure, and the FAT table can
>> happen in different orders, and a power failure can leave one part
>> updated and the other part with old data.  Raid cannot help with this
>> problem.
> 
> Ok, so basically RAID helps only in case of disk failure, right ?

Exactly correct (where "disk failure" includes both complete failure of
the disk, and unrecoverable read errors).  Raid does not help against
corruption due to power fails (if you have a raid card with a battery
backup, and a filesystem with journalling, it should help here), and it
does not help against the most common cause of data loss - human error!

> 
> It seems odd to have chosen FAT32 in the first place then.

FAT32 is the worst possible choice of a filesystem, except for three
aspects - it is quite simple and can be implemented in a small amount of
code (such as in EFI or a bootloader), it is usable on small disks or
partitions, and it is supported by brain-dead OS's that don't understand
better alternatives (NTFS has journalling, but is a monster to implement
in something the size of EFI).

It's a crap filesystem, but it is the "industry standard" for small
disks and small systems.

> 
>>
>> The most important way to protect your FAT32 system is simply to avoid
>> writing to it except when absolutely necessary.  If it is mounted
>> read-only, and only updated when changing grub or updating the kernel,
>> then just make sure you don't power-cycle your machine at that time.
> 
> Well, the problem is that you never know when power failures happen at
> least for me with a small server without any power backup.

The answer here is staring you in the face... get an UPS.  A small one
is not expensive - you only need it to run the server for a couple of
minutes.  Even though journalled filesystems can keep their /metadata/
consistency after a power failure, they don't normally guarantee /data/
consistency, and certainly cannot guarantee /application level/
consistency.  You get that from doing a proper shutdown.  And remember
also that after an unclean shutdown, restarts involve long consistency
checks at the raid level and at the filesystem level - an UPS will let
you avoid that.

> 
>> The smaller the critical window, the smaller the chances of problems.
>>
>> If you need to do updates more regularly, then your best bet is to have
>> independent FAT32 partitions on the two disks.  Make your updates on one
>> disk, and when it is finished copy the changes onto the other disk.
>> Then you always have a good copy - if you get a crash while the first
>> disk is being updated, then when you re-start the computer, use its boot
>> menu to choose booting from the second disk.
> 
> That seems the best thing to do then.
> 
> Thanks.
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux