Re: Enclosure power failure pausing client IO till all connected hosts up

Mallikarjun Biradar <mallikarjuna.biradar@xxxxxxxxx> · Thu, 9 Jul 2015 16:50:55 +0530

I have size=2 & min_size=1 and IO is paused till all hosts com back.

On Thu, Jul 9, 2015 at 4:41 PM, Jan Schermer <jan@xxxxxxxxxxx> wrote:
> What is the min_size setting for the pool? If you have size=2 and min_size=2, then all your data is safe when one replica is down, but the IO is paused. If you want to continue IO you need to set min_size=1.
> But be aware that a single failure after that causes you to lose all the data, you’d have to revert to the other replica if it comes up and works - no idea how that works in ceph but will likely be a PITA to do.
>
> Jan
>
>> On 09 Jul 2015, at 12:42, Mallikarjun Biradar <mallikarjuna.biradar@xxxxxxxxx> wrote:
>>
>> Hi all,
>>
>> Setup details:
>> Two storage enclosures each connected to 4 OSD nodes (Shared storage).
>> Failure domain is Chassis (enclosure) level. Replication count is 2.
>> Each host has allotted with 4 drives.
>>
>> I have active client IO running on cluster. (Random write profile with
>> 4M block size & 64 Queue depth).
>>
>> One of enclosure had power loss. So all OSD's from hosts that are
>> connected to this enclosure went down as expected.
>>
>> But client IO got paused. After some time enclosure & hosts connected
>> to it came up.
>> And all OSD's on that hosts came up.
>>
>> Till this time, cluster was not serving IO. Once all hosts & OSD's
>> pertaining to that enclosure came up, client IO resumed.
>>
>>
>> Can anybody help me why cluster not serving IO during enclosure
>> failure. OR its a bug?
>>
>> -Thanks & regards,
>> Mallikarjun Biradar
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com