Re: why my cluster become unavailable

"hzwulibin" <hzwulibin@xxxxxxxxx> · Mon, 23 Nov 2015 09:00:16 +0800

Hi, Sage

Thanks! Will try it when next testing!

------------------				 
hzwulibin
2015-11-23

-------------------------------------------------------------
发件人：Sage Weil <sage@xxxxxxxxxxxx>
发送日期：2015-11-22 01:49
收件人：Haomai Wang
抄送：Libin Wu,ceph-devel
主题：Re: why my cluster become unavailable

On Sun, 22 Nov 2015, Haomai Wang wrote:
> On Thu, Nov 19, 2015 at 11:26 PM, Libin Wu <hzwulibin@xxxxxxxxx> wrote:
> > Hi, cepher
> >
> > I have a cluster of 6 OSD server, every server has 8 OSDs.
> >
> > I out 4 OSDs on every server, then my client io is blocking.
> >
> > I reboot my client and then create a new rbd device, but the new
> > device also can't write io.
> >
> > Yeah, i understand that some data may lost as threee replicas of some
> > object were lost, but why the cluster become unavailable?
> >
> > There 80 incomplete pg and 4 down+incomplete pg.
> >
> > Any solution i could solve the problem?
> 
> Yes, if you doesn't have a special crushmap to control the data
> replcement policy, pg will lack of necessary metadata to boot. If need
> to readd outed osds or force remove pg which is incomplete(hope it's
> just a test).

Is min_size 2 or 1?  Reducing it to 1 will generally clear some of the 
incomplete pgs.  Just remember to raise it back to 2 after the cluster 
recovers.

sage

��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f