Re: Ceph PG Incomplete = Cluster unusable

Gregory Farnum <greg@xxxxxxxxxxx> · Thu, 8 Jan 2015 10:32:31 -0800

On Wed, Jan 7, 2015 at 9:55 PM, Christian Balzer <chibi@xxxxxxx> wrote:
> On Wed, 7 Jan 2015 17:07:46 -0800 Craig Lewis wrote:
>
>> On Mon, Dec 29, 2014 at 4:49 PM, Alexandre Oliva <oliva@xxxxxxx> wrote:
>>
>> > However, I suspect that temporarily setting min size to a lower number
>> > could be enough for the PGs to recover.  If "ceph osd pool <pool> set
>> > min_size 1" doesn't get the PGs going, I suppose restarting at least
>> > one of the OSDs involved in the recovery, so that they PG undergoes
>> > peering again, would get you going again.
>> >
>>
>> It depends on how incomplete your incomplete PGs are.
>>
>> min_size is defined as "Sets the minimum number of replicas required for
>> I/O.".  By default, size is 3 and min_size is 2 on recent versions of
>> ceph.
>>
>> If the number of replicas you have drops below min_size, then Ceph will
>> mark the PG as incomplete.  As long as you have one copy of the PG, you
>> can recover by lowering the min_size to the number of copies you do
>> have, then restoring the original value after recovery is complete.  I
>> did this last week when I deleted the wrong PGs as part of a toofull
>> experiment.
>>
> Which of course begs the question of why not having min_size at 1
> permanently, so that in the (hopefully rare) case of loosing 2 OSDs at the
> same time your cluster still keeps working (as it should with a size of 3).

You no longer have write durability if you only have one copy of a PG.

Sam is fixing things up so that recovery will work properly as long as
you have a whole copy of the PG, which should make things behave as
people expect.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com