Your first point of troubleshooting is pretty much always to look at "ceph -s" and see what it says. In this case it's probably telling you that some PGs are down, and then you can look at why (but perhaps it's something else). -Greg On Thu, Jul 9, 2015 at 12:22 PM, Mallikarjun Biradar <mallikarjuna.biradar@xxxxxxxxx> wrote: > Yeah. All OSD's down and monitors still up.. > > On Thu, Jul 9, 2015 at 4:51 PM, Jan Schermer <jan@xxxxxxxxxxx> wrote: >> And are the OSDs getting marked down during the outage? >> Are all the MONs still up? >> >> Jan >> >>> On 09 Jul 2015, at 13:20, Mallikarjun Biradar <mallikarjuna.biradar@xxxxxxxxx> wrote: >>> >>> I have size=2 & min_size=1 and IO is paused till all hosts com back. >>> >>> On Thu, Jul 9, 2015 at 4:41 PM, Jan Schermer <jan@xxxxxxxxxxx> wrote: >>>> What is the min_size setting for the pool? If you have size=2 and min_size=2, then all your data is safe when one replica is down, but the IO is paused. If you want to continue IO you need to set min_size=1. >>>> But be aware that a single failure after that causes you to lose all the data, you’d have to revert to the other replica if it comes up and works - no idea how that works in ceph but will likely be a PITA to do. >>>> >>>> Jan >>>> >>>>> On 09 Jul 2015, at 12:42, Mallikarjun Biradar <mallikarjuna.biradar@xxxxxxxxx> wrote: >>>>> >>>>> Hi all, >>>>> >>>>> Setup details: >>>>> Two storage enclosures each connected to 4 OSD nodes (Shared storage). >>>>> Failure domain is Chassis (enclosure) level. Replication count is 2. >>>>> Each host has allotted with 4 drives. >>>>> >>>>> I have active client IO running on cluster. (Random write profile with >>>>> 4M block size & 64 Queue depth). >>>>> >>>>> One of enclosure had power loss. So all OSD's from hosts that are >>>>> connected to this enclosure went down as expected. >>>>> >>>>> But client IO got paused. After some time enclosure & hosts connected >>>>> to it came up. >>>>> And all OSD's on that hosts came up. >>>>> >>>>> Till this time, cluster was not serving IO. Once all hosts & OSD's >>>>> pertaining to that enclosure came up, client IO resumed. >>>>> >>>>> >>>>> Can anybody help me why cluster not serving IO during enclosure >>>>> failure. OR its a bug? >>>>> >>>>> -Thanks & regards, >>>>> Mallikarjun Biradar >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@xxxxxxxxxxxxxx >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >> > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com