On 3/9/16, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > On Wed, Mar 9, 2016 at 6:25 AM, Dan van der Ster <dan@xxxxxxxxxxxxxx> > wrote: >> Hi, >> >> For replicated pools we default to min_size=2 when size=3 >> (size-size/2) in order to avoid the split brain scenario, for example >> as described here: >> http://www.spinics.net/lists/ceph-devel/msg27008.html >> >> But for erasure pools we default to min_size=k which I think is a >> recipe for similar problems. >> >> Shouldn't we default to at least min_size=k+1?? >> >> diff --git a/src/mon/OSDMonitor.cc b/src/mon/OSDMonitor.cc >> index 77e26de..5d51686 100644 >> --- a/src/mon/OSDMonitor.cc >> +++ b/src/mon/OSDMonitor.cc >> @@ -4427,7 +4427,7 @@ int OSDMonitor::prepare_pool_size(const unsigned >> pool_type, >> err = get_erasure_code(erasure_code_profile, &erasure_code, ss); >> if (err == 0) { >> *size = erasure_code->get_chunk_count(); >> - *min_size = erasure_code->get_data_chunk_count(); >> + *min_size = erasure_code->get_data_chunk_count() + 1; >> } >> } >> break; > > Well, losing any OSDs at that point would be bad since it would become > inaccessible until you get that whole set back, but there's not really > any chance of serving up bad reads like Sam is worried about in the > ReplicatedPG case. (...at least, assuming you have more data chunks > than parity chunks.) Send in a PR on github? > -Greg > Oops, that link discussed reads, but I'm more worried about writes. I.e. if we allow writes when only k osds are up, then one of the m down osds comes back and starts backfilling or recovery, but then one of the k osds that took writes goes down before recovery completes. PR incoming. .. Dan -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html