Re: btrfs RAID 5?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 5, 2021 at 6:24 AM Richard Shaw <hobbes1069@xxxxxxxxx> wrote:
> Ok, so not so bad. The main reason I'm considering raid5 is that I have one 4TB drive right now, if I add 2 more with raid one, I'm only going to get 2TB. I know it's kind of a duh, you're mirroring and right now I have no redundancy, but this is for home use and $$$/TB is important and I can't fit any more drives in this case :)

Adding 2 drives at a time to Btrfs raid1 is trivial, you just add each:
btrfs dev add /dev/3 /mnt
btrfs dev add /dev/4 /mnt

Implies both mkfs and resize. You don't need to balance it.

For btrfs raid5 that is also true, it'll just make new block groups
that have more stripes. But depending on the sizes of all the drives
it'll probably be more efficient utilization of space to rebalance.
Note if you add two drives that are bigger than the others, once the
others fill up, you'll get block groups made of two chunks on the two
drives with remaining space and that's effectively raid1 utilization,
because it's 1 data strip and 1 parity strip to do raid5 on two
devices.

Unique to Btrfs, you can start raid1 today, add drives, and move to
raid5 later. It's just a balance with a conversion filter.

>
>
>> Toss up on xxhash64, it's as fast or faster than crc32c to compute,
>> collision resistance, but csum algo is a mkfs time only option - only
>> reason why I mention it. I can write more upon request.
>
>
> That's interesting. I hadn't read into the documentation that far yet. Are those the only two options?

There's also blake2b and sha256sum cryptographic hashes. xxhash and
crc32c are non-crypto hashes.

The btrfs-progs source has a utility for doing benchmarking.
crypto/hash-speedtest.c I think.


>> If these are consumer drives: (a) timeout mismatch (b) disable each
>> drive's write cache. This is not btrfs specific advice, applies to
>> mdadm and LVM raid as well. Maybe someone has udev rules for this
>> somewhere and if not we ought to get them into Fedora somehow. hdparm
>> -W is the command, -w is dangerous (!). It is ok to use the write
>> cache if it's determined that the drive firmware honors flush/fua,
>> which they usually do, but the penalty is so bad if they don't and you
>> get a crash that it's maybe not worth taking the risk. Btrfs raid1
>> metadata helps here, as will using different drive make/models,
>> because if one drive does the wrong thing, btrfs self heals from
>> another drive even passively - but for parity raid you really ought to
>> scrub following a crash. Or hey, just avoid crashes :)
>
>
> I guess I could test for this? The current drive is ext4 formatted so my original plan was to create a 2 drive raid1 and copy the files over, format the old drive and then add it to the array and rebalance (a +1 for raid1!). I could switch over to the 2 drive raid1 array for a while and "wait and see" or is there a more proactive method?

There isn't a great way to test for firmware bugs. It's basically do a
bunch of real world pull the power plug tests, and see what breaks. I
think the thing to test is the sequential write performance of each
drive with the write cache enabled and disabled. It really shouldn't
benefit sequential writes at all. So unless writes totally tank, I
would just disable the write cache and then not worry about it. This
requires a udev rule because power cycle resets to default enabled.

For what it's worth "btrfs device add" includes the mkfs and resize,
so you skip those steps. "btrfs replace" includes mkfs, but not resize
- the target must be bigger than the source. And it's almost always
better to shrink the source so that you can use "replace" instead of
"add+remove". Replace is basically a scrub. Fast and simple. Adding a
device followed by remove is two fs shrinks and a balance, it's way
more expensive and while OK with raid1 variants, avoid it for sure
with parity raids. Same for any notion of "just removing one drive".
It only as expensive on single and raid1's as it is to replicate the
data that's on the drive being removed, to the remaining drives. But
for parity raid, it's a complete restripe of the array, and thus
expensive (time and computationally).

>
> Obviously if I go raid5 I won't have this option unless I can temporarily house my data on a separate drive.
>
> Looking at the link it looks like I'm OK?
>
> # smartctl -l scterc /dev/sda1
> smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.9.16-200.fc33.x86_64] (local build)
> Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
>
> SCT Error Recovery Control:
>            Read:    100 (10.0 seconds)
>           Write:    100 (10.0 seconds)

yeah if that's the default it's fine. The kernel's command timer is
30s, so the drive will give up on a read/write error before the kernel
will think it's MIA.


>
> The drive is a Seagate Terascale which is supposedly designed for cloud computing / datacenters.
>
> So raid1 or raid5...

You can change it later either way with -dconvert.

-- 
Chris Murphy
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx



[Index of Archives]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [EPEL Devel]     [Fedora Magazine]     [Fedora Summer Coding]     [Fedora Laptop]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Desktop]     [Fedora Fonts]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Yosemite News]     [Gnome Users]     [KDE Users]     [Fedora Art]     [Fedora Docs]     [Fedora Sparc]     [Libvirt Users]     [Fedora ARM]

  Powered by Linux