On Tue, Jan 5, 2021 at 6:24 AM Richard Shaw <hobbes1069@xxxxxxxxx> wrote: > Ok, so not so bad. The main reason I'm considering raid5 is that I have one 4TB drive right now, if I add 2 more with raid one, I'm only going to get 2TB. I know it's kind of a duh, you're mirroring and right now I have no redundancy, but this is for home use and $$$/TB is important and I can't fit any more drives in this case :) Adding 2 drives at a time to Btrfs raid1 is trivial, you just add each: btrfs dev add /dev/3 /mnt btrfs dev add /dev/4 /mnt Implies both mkfs and resize. You don't need to balance it. For btrfs raid5 that is also true, it'll just make new block groups that have more stripes. But depending on the sizes of all the drives it'll probably be more efficient utilization of space to rebalance. Note if you add two drives that are bigger than the others, once the others fill up, you'll get block groups made of two chunks on the two drives with remaining space and that's effectively raid1 utilization, because it's 1 data strip and 1 parity strip to do raid5 on two devices. Unique to Btrfs, you can start raid1 today, add drives, and move to raid5 later. It's just a balance with a conversion filter. > > >> Toss up on xxhash64, it's as fast or faster than crc32c to compute, >> collision resistance, but csum algo is a mkfs time only option - only >> reason why I mention it. I can write more upon request. > > > That's interesting. I hadn't read into the documentation that far yet. Are those the only two options? There's also blake2b and sha256sum cryptographic hashes. xxhash and crc32c are non-crypto hashes. The btrfs-progs source has a utility for doing benchmarking. crypto/hash-speedtest.c I think. >> If these are consumer drives: (a) timeout mismatch (b) disable each >> drive's write cache. This is not btrfs specific advice, applies to >> mdadm and LVM raid as well. Maybe someone has udev rules for this >> somewhere and if not we ought to get them into Fedora somehow. hdparm >> -W is the command, -w is dangerous (!). It is ok to use the write >> cache if it's determined that the drive firmware honors flush/fua, >> which they usually do, but the penalty is so bad if they don't and you >> get a crash that it's maybe not worth taking the risk. Btrfs raid1 >> metadata helps here, as will using different drive make/models, >> because if one drive does the wrong thing, btrfs self heals from >> another drive even passively - but for parity raid you really ought to >> scrub following a crash. Or hey, just avoid crashes :) > > > I guess I could test for this? The current drive is ext4 formatted so my original plan was to create a 2 drive raid1 and copy the files over, format the old drive and then add it to the array and rebalance (a +1 for raid1!). I could switch over to the 2 drive raid1 array for a while and "wait and see" or is there a more proactive method? There isn't a great way to test for firmware bugs. It's basically do a bunch of real world pull the power plug tests, and see what breaks. I think the thing to test is the sequential write performance of each drive with the write cache enabled and disabled. It really shouldn't benefit sequential writes at all. So unless writes totally tank, I would just disable the write cache and then not worry about it. This requires a udev rule because power cycle resets to default enabled. For what it's worth "btrfs device add" includes the mkfs and resize, so you skip those steps. "btrfs replace" includes mkfs, but not resize - the target must be bigger than the source. And it's almost always better to shrink the source so that you can use "replace" instead of "add+remove". Replace is basically a scrub. Fast and simple. Adding a device followed by remove is two fs shrinks and a balance, it's way more expensive and while OK with raid1 variants, avoid it for sure with parity raids. Same for any notion of "just removing one drive". It only as expensive on single and raid1's as it is to replicate the data that's on the drive being removed, to the remaining drives. But for parity raid, it's a complete restripe of the array, and thus expensive (time and computationally). > > Obviously if I go raid5 I won't have this option unless I can temporarily house my data on a separate drive. > > Looking at the link it looks like I'm OK? > > # smartctl -l scterc /dev/sda1 > smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.9.16-200.fc33.x86_64] (local build) > Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org > > SCT Error Recovery Control: > Read: 100 (10.0 seconds) > Write: 100 (10.0 seconds) yeah if that's the default it's fine. The kernel's command timer is 30s, so the drive will give up on a read/write error before the kernel will think it's MIA. > > The drive is a Seagate Terascale which is supposedly designed for cloud computing / datacenters. > > So raid1 or raid5... You can change it later either way with -dconvert. -- Chris Murphy _______________________________________________ users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx