Another cluster completely hang

Mario Giammarco <mgiammarco@xxxxxxxxx> · Tue, 28 Jun 2016 16:59:51 +0000 (UTC)

Hello,
this is the second time that happens to me, I hope that someone can 
explain what I can do.
Proxmox ceph cluster with 8 servers, 11 hdd. Min_size=1, size=2.

One hdd goes down due to bad sectors. 
Ceph recovers but it ends with:

cluster f2a8dd7d-949a-4a29-acab-11d4900249f4
     health HEALTH_WARN
            3 pgs down
            19 pgs incomplete
            19 pgs stuck inactive
            19 pgs stuck unclean
            7 requests are blocked > 32 sec
     monmap e11: 7 mons at
{0=192.168.0.204:6789/0,1=192.168.0.201:6789/0,
2=192.168.0.203:6789/0,3=192.168.0.205:6789/0,4=192.168.0.202:
6789/0,5=192.168.0.206:6789/0,6=192.168.0.207:6789/0}
            election epoch 722, quorum 
0,1,2,3,4,5,6 1,4,2,0,3,5,6
     osdmap e10182: 10 osds: 10 up, 10 in
      pgmap v3295880: 1024 pgs, 2 pools, 4563 GB data, 1143 kobjects
            9136 GB used, 5710 GB / 14846 GB avail
                1005 active+clean
                  16 incomplete
                   3 down+incomplete

Unfortunately "7 requests blocked" means no virtual machine can boot 
because ceph has stopped i/o.

I can accept to lose some data, but not ALL data!
Can you help me please?
Thanks,
Mario

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com