On Mon, Dec 28, 2020 at 2:33 AM François Patte <francois.patte@xxxxxxxxxxxxxxxxxxxx> wrote: > > Bonjour, > > I try to build a home nas to make a dlna server for audio, video and > pictures. > > I have 2 disks for the data which I want to be mounted in raid1 (software). > > I formated the two disks using btrfs > (mkfs.btrfs -m raid1 -d raid1 /dev/sda /dev/sdb) > > It works but, up to now, I can't see the advantages of this file system > vs ext4 managed by mdadm. > > One disadvantage is that it seems that monitoring the system is not > possible in case of disk failure for instance. Btrfs in a raid1 configuration is significantly different than either mdadm raid or single disk Btrfs. It will self-heal, unambiguously, both metadata and data. It detects corruption including bit rot, torn writes, misdirected writes, even when the drive doesn't report any error. It finds the good copy, and fixes up the bad copy. This happens passively during normal use. The same repair principle applies when scrubbing. A scrub reads all metadata and data, but not unused areas. mdadm raid depends exclusively on drive reported errors, it has no independent means of knowing which copy of a block is valid, because it has no integrity checking. In the case of ext4/xfs metadata checksumming detecting a checksum error in its own metadata, it doesn't know which drive contains the correct copy, and neither does mdadm. Intentionally mismatching different make/model drives actually results in a more reliable setup on Btrfs because any firmware bugs in either drive are isolated. Any bug resulting in corruption on one drive, gets fixed by btrfs from the (meta)data on the other drive. With a regular scrub you have less of a chance of getting bitten by such defects. Depending on the size of the drives and how much data is on them, 'btrfs replace' can be quite a lot faster when replacing a failed drive. This uses a variation on scrub to replicate (meta)data onto a new device. I definitely recommend 'btrfs replace' as a go to for replacing drive, rather than 'btrfs device add' followed by 'btrfs device delete'. Likewise, this will do fixups as problems are encountered as long as there's a good copy. Btrfs also won't kick a drive out of a pool when misbehaving. Kicking a drive out means any partial redundancy it could provide, is lost. Since Btrfs can unambiguously determine if any reads from a drive are corrupt, it's in a position to keep using it and handle the errors. There is an option for 'btrfs replace' to only use the drive being replaced if there are no other good copies of (meta)data found on other drives to make replacement go faster. Bitrot with anything that's already compressed, like audio, video, and images, tends to cause significantly more damage than a mere bit or byte flip might otherwise do. Detecting this and preventing corruption from replicating is a significant feature of Btrfs. Also the ability to add drives and grow the array is probably more straight forward and tolerant of different sized drives. I don't mean users will necessarily avoid confusion but the file system itself can handle it if you add oddly sized drives one after another. 'btrfs device add' iimplies mkfs, and resize. And it'll attempt to balance based on the drives with the most free space available. It is certainly possible to get confused if you don't add two drives at a time of equal size. As for monitoring, nagios check_btrfs might do what you want: https://github.com/knorrie/python-btrfs/blob/master/examples/nagios/plugins/check_btrfs There is a rather pernicious problem using consumer drives on Linux to be aware of that affects mdadm, lvm, Btrfs and (I assume) ZFS raids. And that's this esoteric annoyance of timeout mismatches: https://raid.wiki.kernel.org/index.php/Timeout_Mismatch The gist of that is, the drive firmware's command time out needs to happen before the kernel's. The typical point of confusion is that the kernel's command timer looks like it's a device timer because it's a per block device setting in sysfs. The ideal scenario is to not change the kernel's timer, but use 'smartctl -l scterc,70,70' using something like /dev/disk/by-id in a udev rule, to tell the drive to give up on errors quickly. 70 deciseconds is typical. All drives use deciseconds. If the drive has no configurable SCT ERC, then you have to change the kernel's timeout. If you don't, the kernel thinks the drive isn't responding, does a link reset, and now the whole command queue is lost and we have no idea why the drive wasn't responding. I figure a greater than 90% chance it's a bad sector and the drive is intentionally not responding because it's in "deep recovery" if it's a consumer drive. I know. I know. Sounds like bat guano. There is a dracut bug that could cause some confusion if the drives aren't both available at mount time. The btrfs udev rule causes a wait for all btrfs devices to appear for a particular fs UUID before systemd will attempt to mount it, to prevent mount failure. This normally only affects btrfs multiple device volumes when used as a system root. But if you have many different devices, possibly on different controllers, set to automount in fstab, it could be an issue. https://github.com/dracutdevs/dracut/issues/947 -- Chris Murphy _______________________________________________ users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx