Re: Bug or by design?

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Tue, 18 Nov 2014 17:18:08 -0700

On Nov 18, 2014 4:48 PM, "Gregory Farnum" <greg@xxxxxxxxxxx> wrote:

>

> On Tue, Nov 18, 2014 at 3:38 PM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:

> > I was going to submit this as a bug, but thought I would put it here for

> > discussion first. I have a feeling that it could be behavior by design.

> >

> > ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)

> >

> > I'm using a cache pool and was playing around with the size and min_size on

> > the pool to see the effects of replication. I set size/min_size to 1, then I

> > ran "ceph osd pool set ssd size 3; ceph osd pool set ssd min_size 2". Client

> > I/O immediately blocked as there was not 2 copies yet (as expected).

> > However, after the degraded objects are cleared up, there are several PGs in

> > the remapped+incomplete state and client I/O continues to be blocked even

> > though all OSDs are up and healthy (even left overnight). If I set min_size

> > back down to 1, the cluster recovers and client I/O continues.

> >

> > I expected that as long as there is one copy of the data, the cluster can

> > copy that data to min_size and cluster operations resume.

> >

> > Where I think it could be by design is when min_size was already set to 2

> > and you lose enough OSDs fast enough to dip below that level. There could be

> > the chance that the serving OSD could have bad data (but we wouldn't know

> > that anyway at the moment). The bad data could then be replicated and the

> > ability to recover any good data would be lost.

> >

> > However, if Ceph immediately replicated the sole OSD to get back to min_size

> > then when the other(s) came back online, it could back fill and just destroy

> > the extras.

> >

> > It seems that immediately replication to keep the cluster operational seems

> > like a good thing overall. Am I missing something?

>

> This is sort of by design, but mostly an accident of many other

> architecture choices. Sam is actually working now to enable PG

> recovery when you have fewer than min_size copies available; I very

> much doubt it will be backported to any existing LTS releases but it

> ought to be in Hammer.

> -Greg
Greg, thanks for the update. I'll refrain from submitting a bug request since it is already being worked on. For now we will make sure that we don't increase min_size until size has been increased and the objects have been completely replicated. 
Robert LeBlanc

Sent from a mobile device please excuse any typos.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com