Den mån 16 nov. 2020 kl 10:54 skrev Hans van den Bogert < hansbogert@xxxxxxxxx>: > > With this profile you can only loose one OSD at a time, which is really > > not that redundant. > That's rather situation dependent. I don't have really large disks, so > the repair time isn't that large. > Further, my SLO isn't that high that I need 99.xxx% uptime, if 2 disks > break in the same repair window, that would be unfortunate, but I'd just > grab a backup from a mirroring cluster. Looking at it from another > perspective, I came from a single host RAID5 scenario, I'd argue this is > better since I can survive a host failure. > > Also this is a sliding problem right? Someone with K+3 could argue K+2 > is not enough as well. > There are a few situations like when you are moving data or when a scrub found a bad PG where you are suddenly out of copies in case something bad happens. I think Raid5 operators also found this out, when your cold spare disk kicks in, you find that old undetected error on one of the other disks and think repairs are bad or stress your raid too much. As with raids, the cheapest resource is often the actual disks and not operator time, restore-wait-times and so on, so that is why many on this list advocates for K+2-or-more, or Repl=3 because we have seen the errors one normally didn't expect. Yes, a double surprise of two disks failing in the same night after running for years is uncommon, but it is not as uncommon to resize pools, move PGs around or find a scrub error or two some day. So while one could always say "one more drive is better than your amount", there are people losing data with repl=2 or K+1 because some more normal operation was in flight and _then_ a single surprise happens. So you can have a weird reboot, causing those PGs needing backfill later, and if one of the uptodate hosts have any single surprise during the recovery, the cluster will lack some of the current data even if two disks were never down at the same time. Drive manufacturers print Mean Time Between Failures, storage admins count Mean Time Between Surprises.. -- May the most significant bit of your life be positive. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx