Re: Bug or by design?

Gregory Farnum <greg@xxxxxxxxxxx> · Tue, 18 Nov 2014 15:48:58 -0800



On Tue, Nov 18, 2014 at 3:38 PM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
> I was going to submit this as a bug, but thought I would put it here for
> discussion first. I have a feeling that it could be behavior by design.
>
> ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
>
> I'm using a cache pool and was playing around with the size and min_size on
> the pool to see the effects of replication. I set size/min_size to 1, then I
> ran "ceph osd pool set ssd size 3; ceph osd pool set ssd min_size 2". Client
> I/O immediately blocked as there was not 2 copies yet (as expected).
> However, after the degraded objects are cleared up, there are several PGs in
> the remapped+incomplete state and client I/O continues to be blocked even
> though all OSDs are up and healthy (even left overnight). If I set min_size
> back down to 1, the cluster recovers and client I/O continues.
>
> I expected that as long as there is one copy of the data, the cluster can
> copy that data to min_size and cluster operations resume.
>
> Where I think it could be by design is when min_size was already set to 2
> and you lose enough OSDs fast enough to dip below that level. There could be
> the chance that the serving OSD could have bad data (but we wouldn't know
> that anyway at the moment). The bad data could then be replicated and the
> ability to recover any good data would be lost.
>
> However, if Ceph immediately replicated the sole OSD to get back to min_size
> then when the other(s) came back online, it could back fill and just destroy
> the extras.
>
> It seems that immediately replication to keep the cluster operational seems
> like a good thing overall. Am I missing something?

This is sort of by design, but mostly an accident of many other
architecture choices. Sam is actually working now to enable PG
recovery when you have fewer than min_size copies available; I very
much doubt it will be backported to any existing LTS releases but it
ought to be in Hammer.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com