Re: (Ceph Octopus) Repairing a neglected Ceph cluster - Degraded Data Reduncancy, all PGs degraded, undersized, not scrubbed in time

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I think we're deviating from the original thread quite a bit and I would never argue that in a production environment with plenty OSDs you should go for R=2 or K+1, so my example cluster which happens to be 2+1 is a bit unlucky.

However I'm interested in the following

On 11/16/20 11:31 AM, Janne Johansson wrote:
> So while one could always say "one more drive is better than your
> amount", there are people losing data with repl=2 or K+1 because some
> more normal operation was in flight and _then_ a single surprise
> happens.  So you can have a weird reboot, causing those PGs needing
> backfill later, and if one of the uptodate hosts have any single
> surprise during the recovery, the cluster will lack some of the current
> data even if two disks were never down at the same time.

I'm not sure I follow, from a logical perspective they *are* down at the same time right? In your scenario 1 up-to-date replica was left, but even that had a surprise. Okay well that's the risk you take with R=2, but it's not intrinsically different than R=3.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux