Re: why my cluster become unavailable

Sage Weil <sage@xxxxxxxxxxxx> · Sat, 21 Nov 2015 09:49:01 -0800 (PST)

On Sun, 22 Nov 2015, Haomai Wang wrote:
> On Thu, Nov 19, 2015 at 11:26 PM, Libin Wu <hzwulibin@xxxxxxxxx> wrote:
> > Hi, cepher
> >
> > I have a cluster of 6 OSD server, every server has 8 OSDs.
> >
> > I out 4 OSDs on every server, then my client io is blocking.
> >
> > I reboot my client and then create a new rbd device, but the new
> > device also can't write io.
> >
> > Yeah, i understand that some data may lost as threee replicas of some
> > object were lost, but why the cluster become unavailable?
> >
> > There 80 incomplete pg and 4 down+incomplete pg.
> >
> > Any solution i could solve the problem?
> 
> Yes, if you doesn't have a special crushmap to control the data
> replcement policy, pg will lack of necessary metadata to boot. If need
> to readd outed osds or force remove pg which is incomplete(hope it's
> just a test).

Is min_size 2 or 1?  Reducing it to 1 will generally clear some of the 
incomplete pgs.  Just remember to raise it back to 2 after the cluster 
recovers.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html