Re: (Ceph Octopus) Repairing a neglected Ceph cluster - Degraded Data Reduncancy, all PGs degraded, undersized, not scrubbed in time

Janne Johansson <icepic.dz@xxxxxxxxx> · Mon, 16 Nov 2020 11:31:51 +0100

Den mån 16 nov. 2020 kl 10:54 skrev Hans van den Bogert <
hansbogert@xxxxxxxxx>:

> > With this profile you can only loose one OSD at a time, which is really
> > not that redundant.
> That's rather situation dependent. I don't have really large disks, so
> the repair time isn't that large.
> Further, my SLO isn't that high that I need 99.xxx% uptime, if 2 disks
> break in the same repair window, that would be unfortunate, but I'd just
> grab a backup from a mirroring cluster. Looking at it from another
> perspective, I came from a single host RAID5 scenario, I'd argue this is
> better since I can survive a host failure.
>
> Also this is a sliding problem right? Someone with K+3 could argue K+2
>   is not enough as well.
>

There are a few situations like when you are moving data or when a scrub
found a bad PG where you are suddenly out of copies in case something bad
happens. I think Raid5 operators also found this out, when your cold spare
disk kicks in, you find that old undetected error on one of the other disks
and think repairs are bad or stress your raid too much.

As with raids, the cheapest resource is often the actual disks and not
operator time, restore-wait-times and so on, so that is why many on this
list advocates for K+2-or-more, or Repl=3 because we have seen the errors
one normally didn't expect. Yes, a double surprise of two disks failing in
the same night after running for years is uncommon, but it is not as
uncommon to resize pools, move PGs around or find a scrub error or two some
day.

So while one could always say "one more drive is better than your amount",
there are people losing data with repl=2 or K+1 because some more normal
operation was in flight and _then_ a single surprise happens.  So you can
have a weird reboot, causing those PGs needing backfill later, and if one
of the uptodate hosts have any single surprise during the recovery, the
cluster will lack some of the current data even if two disks were never
down at the same time.

Drive manufacturers print Mean Time Between Failures, storage admins count
Mean Time Between Surprises..

-- 
May the most significant bit of your life be positive.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx