Re: Re: why my cluster become unavailable (min_size of pool)

"hzwulibin" <hzwulibin@xxxxxxxxx> · Thu, 26 Nov 2015 16:04:58 +0800

Hi, haomai

Thanks for quick reply, your explain make sense for me.

Thanks!

------------------				 
hzwulibin
2015-11-26

-------------------------------------------------------------
发件人：Haomai Wang <haomaiwang@xxxxxxxxx>
发送日期：2015-11-26 16:00
收件人：hzwulibin
抄送：Sage Weil,ceph-devel
主题：Re: why my cluster become unavailable (min_size of pool)

On Thu, Nov 26, 2015 at 3:54 PM, hzwulibin <hzwulibin@xxxxxxxxx> wrote:
> Hi, Sage
>
> I has a question about min_size of pool.
>
> The default value of min_size is 2, but in this setting, when two OSDs are down(mean two replicas lost) at same time, the IO will be blocked.
> We want to set the min_size to 1 in our production environment as we think it's normal case when two OSDs are down(sure on different host) at same time.

min_size with 2 means each object must ensure two copies in this pool.
It mainly reduce the permanent storage media corrupt risk which cause
actual data lose. That's mean if min_size is 1 and under this degraded
case, one more osd  permanent corrupt will cause data lose. If
min_size is 2, it need at least 2 osds.

>
> So is there anypotential problem of this setting?
>
> We use 0.80.10 version.
>
> Thanks!
>
>
> ------------------
> hzwulibin
> 2015-11-26
>
> -------------------------------------------------------------
> 发件人："hzwulibin"<hzwulibin@xxxxxxxxx>
> 发送日期：2015-11-23 09:00
> 收件人：Sage Weil,Haomai Wang
> 抄送：ceph-devel
> 主题：Re: why my cluster become unavailable
>
> Hi, Sage
>
> Thanks! Will try it when next testing!
>
> ------------------
> hzwulibin
> 2015-11-23
>
> -------------------------------------------------------------
> 发件人：Sage Weil <sage@xxxxxxxxxxxx>
> 发送日期：2015-11-22 01:49
> 收件人：Haomai Wang
> 抄送：Libin Wu,ceph-devel
> 主题：Re: why my cluster become unavailable
>
> On Sun, 22 Nov 2015, Haomai Wang wrote:
>> On Thu, Nov 19, 2015 at 11:26 PM, Libin Wu <hzwulibin@xxxxxxxxx> wrote:
>> > Hi, cepher
>> >
>> > I have a cluster of 6 OSD server, every server has 8 OSDs.
>> >
>> > I out 4 OSDs on every server, then my client io is blocking.
>> >
>> > I reboot my client and then create a new rbd device, but the new
>> > device also can't write io.
>> >
>> > Yeah, i understand that some data may lost as threee replicas of some
>> > object were lost, but why the cluster become unavailable?
>> >
>> > There 80 incomplete pg and 4 down+incomplete pg.
>> >
>> > Any solution i could solve the problem?
>>
>> Yes, if you doesn't have a special crushmap to control the data
>> replcement policy, pg will lack of necessary metadata to boot. If need
>> to readd outed osds or force remove pg which is incomplete(hope it's
>> just a test).
>
> Is min_size 2 or 1?  Reducing it to 1 will generally clear some of the
> incomplete pgs.  Just remember to raise it back to 2 after the cluster
> recovers.
>
> sage
>
>

-- 
Best Regards,

Wheat

��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f