Re: Bug or by design?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Nov 18, 2014 4:48 PM, "Gregory Farnum" <greg@xxxxxxxxxxx> wrote:
>
> On Tue, Nov 18, 2014 at 3:38 PM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
> > I was going to submit this as a bug, but thought I would put it here for
> > discussion first. I have a feeling that it could be behavior by design.
> >
> > ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
> >
> > I'm using a cache pool and was playing around with the size and min_size on
> > the pool to see the effects of replication. I set size/min_size to 1, then I
> > ran "ceph osd pool set ssd size 3; ceph osd pool set ssd min_size 2". Client
> > I/O immediately blocked as there was not 2 copies yet (as expected).
> > However, after the degraded objects are cleared up, there are several PGs in
> > the remapped+incomplete state and client I/O continues to be blocked even
> > though all OSDs are up and healthy (even left overnight). If I set min_size
> > back down to 1, the cluster recovers and client I/O continues.
> >
> > I expected that as long as there is one copy of the data, the cluster can
> > copy that data to min_size and cluster operations resume.
> >
> > Where I think it could be by design is when min_size was already set to 2
> > and you lose enough OSDs fast enough to dip below that level. There could be
> > the chance that the serving OSD could have bad data (but we wouldn't know
> > that anyway at the moment). The bad data could then be replicated and the
> > ability to recover any good data would be lost.
> >
> > However, if Ceph immediately replicated the sole OSD to get back to min_size
> > then when the other(s) came back online, it could back fill and just destroy
> > the extras.
> >
> > It seems that immediately replication to keep the cluster operational seems
> > like a good thing overall. Am I missing something?
>
> This is sort of by design, but mostly an accident of many other
> architecture choices. Sam is actually working now to enable PG
> recovery when you have fewer than min_size copies available; I very
> much doubt it will be backported to any existing LTS releases but it
> ought to be in Hammer.
> -Greg

Greg, thanks for the update. I'll refrain from submitting a bug request since it is already being worked on. For now we will make sure that we don't increase min_size until size has been increased and the objects have been completely replicated.

Robert LeBlanc

Sent from a mobile device please excuse any typos.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux