We set up a small ceph cluster of three nodes on top of an OpenStack deployment of three nodes (that is, each compute node was also an OSD/MON node). Worked great until we started to expand the ceph cluster once the OSDs started to fill up. I added 4 OSDs two days ago and the recovery went smoothly. I added another four last night, but the recovery is stuck: root@kvm-sn-14i:~# ceph -s health HEALTH_WARN 22 pgs backfill_toofull; 19 pgs degraded; 1 pgs recovering; 23 pgs stuck unclean; recovery 157614/1775814 degraded (8.876%); recovering 2 o/s, 8864KB/s; 1 near full osd(s) monmap e1: 3 mons at {kvm-cs-sn-10i=192.168.241.110:6789/0,kvm-cs-sn-14i=192.168.241.114:6789/0,kvm-cs-sn-15i=192.168.241.115:6789/0}, election epoch 42, quorum 0,1,2 kvm-cs-sn-10i,kvm-cs-sn-14i,kvm-cs-sn-15i osdmap e512: 30 osds: 27 up, 27 in pgmap v1474651: 448 pgs: 425 active+clean, 1 active+recovering+remapped, 3 active+remapped+backfill_toofull, 11 active+degraded+backfill_toofull, 8 active+degraded+remapped+backfill_toofull; 3414 GB data, 6640 GB used, 7007 GB / 13647 GB avail; 0B/s rd, 2363B/s wr, 0op/s; 157614/1775814 degraded (8.876%); recovering 2 o/s, 8864KB/s mdsmap e1: 0/0/1 up Even after restarting the OSDs, it hangs at 8.876%. Consequently, many of our virts have crashed. I'm hoping someone on this list can provide some suggestions. Otherwise, I may have to blow this up. Thanks! -- \*..+.- --Greg Chavez +//..;}; _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com