What is the min_size setting for the pool? If you have size=2 and min_size=2, then all your data is safe when one replica is down, but the IO is paused. If you want to continue IO you need to set min_size=1. But be aware that a single failure after that causes you to lose all the data, you’d have to revert to the other replica if it comes up and works - no idea how that works in ceph but will likely be a PITA to do. Jan > On 09 Jul 2015, at 12:42, Mallikarjun Biradar <mallikarjuna.biradar@xxxxxxxxx> wrote: > > Hi all, > > Setup details: > Two storage enclosures each connected to 4 OSD nodes (Shared storage). > Failure domain is Chassis (enclosure) level. Replication count is 2. > Each host has allotted with 4 drives. > > I have active client IO running on cluster. (Random write profile with > 4M block size & 64 Queue depth). > > One of enclosure had power loss. So all OSD's from hosts that are > connected to this enclosure went down as expected. > > But client IO got paused. After some time enclosure & hosts connected > to it came up. > And all OSD's on that hosts came up. > > Till this time, cluster was not serving IO. Once all hosts & OSD's > pertaining to that enclosure came up, client IO resumed. > > > Can anybody help me why cluster not serving IO during enclosure > failure. OR its a bug? > > -Thanks & regards, > Mallikarjun Biradar > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com