On Sun, Dec 22, 2019 at 12:52 AM Javier Perez <pepebuho@xxxxxxxxx> wrote: > > Hi > My home partition is on a 2T HDD using btrfs > > I am reading the material at > http://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices > but still I am not that clear on some items. > > If I want to to add a second 2T drive to work as a mirror (RAID1) it looks like I do not have to invoke mdadm or anything similar, it seems like btrfs will handle it all internally. Am I understanding this right? Correct. > > Also, before I add a new device, do I have to partition the drive or does btrfs take over all these duties (partitioning, formating) when it adds the device to the filesystem? Partitioning is optional. Drives I dedicate for one task only, I do not partition. If I use them for other things, or might use them for other things, then I partition them. The add command formats the new device and resizes the file system: # btrfs device add /dev/sdX /mountpoint The balance command with a convert filter changes the profile for specified block groups, and does replication: # btrfs balance start -dconvert=raid1 -mconvert=raid1 /mountpoint > What has been the experience like with such a system? Gotcha 1: applies to mdadm and LVM raid as well as Btrfs, is that it's really common for mismatching drive SCT ERC and kernel SCSI block command timer. That is, there is a drive error timeout and a kernel block device error timeout. The drive's timeout must be less than the kernel, or valuable information is lost that prevents self-healing, allows bad sectors to accumulate, and eventually there will be data loss. The thing is, the defaults are often wrong: consumer hard drives often have very long SCT ERC, typically it's disabled, making for really impressive timeouts in excess of 1 minute (some suggest it can be 2 or 3 minutes), whereas the kernel command timeout is 30 seconds. Ideally, use 'smartctl -l scterc' to set the SCT ERC to something like 7 seconds, this can also be set using a udev rule pointed to the device by-id using serial number or wwn. You want the drive firmware to give up on read errors quickly, that way it reports the bad sector's LBA to the kernel, which in turn can find a good copy (raid1, 5, 6 or DUP profiles on Btrfs) and overwrite the bad sector thereby fixing it. If the drive doesn't support SCT ERC, then you'll need to increase the kernel's command timer. This is a kernel setting, but it is per block device. And raise the value to something fairly incredible, like 180 seconds. That means worst case scenario, a marginally bad sector results in possibly a 3 minute hang until the drive gives up, and reports a read error - and then it gets fixed up. It seems esoteric, but really it's pernicious and common in the data loss cases reported on linux-raid@ where they have the most experience with RAID. But it applies the same to Btrfs. More info here: https://raid.wiki.kernel.org/index.php/Timeout_Mismatch Gotcha 2, 3, 4: Device failures mean multiple gotchas all at once, so you kinda need a plan how to deal with this so you aren't freaking out if it happens. Panic often leads to user induced data loss. If in doubt, you are best off doing nothing and asking. Both linux-btrfs@ list and #btrfs on IRC freenode.net are approachable for this. Gotcha: If a device dies, you're not likely to see any indication of failure unless you're looking at kernel messages, and see a ton of Btrfs complaints. Like, several scary red warnings *per* lost write. If a drive dies, there will quickly be thousands of these. Whether you do or don't notice this, the next time you reboot... Gotcha: By default, Btrfs fails to mount if it can't find all devices. This is because there are consequences to degraded operation, and it requires user interaction to make sure its all resolved. But because such mounts fail, there's a udev rule to wait for all Btrfs member devices, that way small delays between multiple devices appearing, don't result in failed mounts. But there's no timeout for this udev rule, near as I can tell: This is the rule /usr/lib/udev/rules.d/64-btrfs.rules So now you're stuck in this startup hang. If it's just a case of the device accidentally missing, it's safe to reconnect it, and then startup will proceed normally. Otherwise, you need a way to get unstuck. I'm improvising here, but what you want to do is remove the suspect drive, (temporarily) disable this udev rule, so that it *will* try to mount /home, and also you could change the fstab to add the "degraded" option so that the mount attempt won't fail. Now at least you can boot and work while degraded until you get a chance to really fix the problem. A degraded /home operation isn't any more risky than a single device /home - the consequences really are all in making sure it's put back together correctly. Ok so how to do all that? Either boot off a Live CD, inhibit the udev rule, change fstab. Or you could boot your system with rd.break=cmdline, mount root file system to /sysroot and make these changes. Before rebooting, use 'btrfs filesystem show' to identify which drive btrfs thinks is missing/bad and remove it. You can use 'btrfs replace' or 'btrfs dev add' followed by 'btrfs dev rem missing'; the first is preferred but you need to read all the man pages about both methods so you're aware of whether or not you need to do an fs resize; and use 'btrfs fi us /mount' to check usage for any block groups that are not raid1. During degraded write, it's possible some single copy data block groups are created, those need to be manually converted to raid1 (yes you can have mixed replication levels on btrfs). And in the case where some degraded writes happen, and you get the missing device reconnected, you'll use 'btrfs scrub' to get those degraded writes replicated to the formerly missing device. That's not automatic either. A couple more gotchas to be aware of, which might be less bad with the latest kernels, but without testing for it I wouldn't assume they're fixed: https://btrfs.wiki.kernel.org/index.php/Gotchas#raid1_volumes_only_mountable_once_RW_if_degraded https://btrfs.wiki.kernel.org/index.php/Gotchas#Block-level_copies_of_devices Otherwise, btrfs raid1 is stable on stable hardware. It automatically self heals if it finds problems during normal operation. And also heals during scrubs. The gotchas only start if there's some kind of problem. And then the challenge is to understand the exact nature of the problem before taking action. Same issue with mdadm, and LVM raids - just different gotchas and commands. -- Chris Murphy _______________________________________________ users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx