Re: default min_size for erasure pools

Gregory Farnum <gfarnum@xxxxxxxxxx> · Wed, 9 Mar 2016 12:05:42 -0800



On Wed, Mar 9, 2016 at 6:25 AM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> Hi,
>
> For replicated pools we default to min_size=2 when size=3
> (size-size/2) in order to avoid the split brain scenario, for example
> as described here:
> http://www.spinics.net/lists/ceph-devel/msg27008.html
>
> But for erasure pools we default to min_size=k which I think is a
> recipe for similar problems.
>
> Shouldn't we default to at least min_size=k+1??
>
> diff --git a/src/mon/OSDMonitor.cc b/src/mon/OSDMonitor.cc
> index 77e26de..5d51686 100644
> --- a/src/mon/OSDMonitor.cc
> +++ b/src/mon/OSDMonitor.cc
> @@ -4427,7 +4427,7 @@ int OSDMonitor::prepare_pool_size(const unsigned
> pool_type,
>        err = get_erasure_code(erasure_code_profile, &erasure_code, ss);
>        if (err == 0) {
>         *size = erasure_code->get_chunk_count();
> -       *min_size = erasure_code->get_data_chunk_count();
> +       *min_size = erasure_code->get_data_chunk_count() + 1;
>        }
>      }
>      break;

Well, losing any OSDs at that point would be bad since it would become
inaccessible until you get that whole set back, but there's not really
any chance of serving up bad reads like Sam is worried about in the
ReplicatedPG case. (...at least, assuming you have more data chunks
than parity chunks.) Send in a PR on github?
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html