Re: Enclosure power failure pausing client IO till all connected hosts up

Eric Eastman <eric.eastman@xxxxxxxxxxxxxx> · Thu, 23 Jul 2015 09:07:12 -0600

You may want to check your min_size value for your pools.  If it is
set to the pool size value, then the cluster will not do I/O if you
loose a chassis.

On Sun, Jul 5, 2015 at 11:04 PM, Mallikarjun Biradar
<mallikarjuna.biradar@xxxxxxxxx> wrote:
> Hi all,
>
> Setup details:
> Two storage enclosures each connected to 4 OSD nodes (Shared storage).
> Failure domain is Chassis (enclosure) level. Replication count is 2.
> Each host has allotted with 4 drives.
>
> I have active client IO running on cluster. (Random write profile with 4M
> block size & 64 Queue depth).
>
> One of enclosure had power loss. So all OSD's from hosts that are connected
> to this enclosure went down as expected.
>
> But client IO got paused. After some time enclosure & hosts connected to it
> came up.
> And all OSD's on that hosts came up.
>
> Till this time, cluster was not serving IO. Once all hosts & OSD's
> pertaining to that enclosure came up, client IO resumed.
>
>
> Can anybody help me why cluster not serving IO during enclosure failure. OR
> its a bug?
>
> -Thanks & regards,
> Mallikarjun Biradar
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com