OSD recovery stuck

Greg Chavez <greg.chavez@xxxxxxxxx> · Thu, 27 Jun 2013 10:16:37 -0400

We set up a small ceph cluster of three nodes on top of an OpenStack
deployment of three nodes (that is, each compute node was also an
OSD/MON node).  Worked great until we started to expand the ceph
cluster once the OSDs started to fill up.  I added 4 OSDs two days ago
and the recovery went smoothly.  I added another four last night, but
the recovery is stuck:

root@kvm-sn-14i:~# ceph -s
   health HEALTH_WARN 22 pgs backfill_toofull; 19 pgs degraded; 1 pgs
recovering; 23 pgs stuck unclean; recovery 157614/1775814 degraded
(8.876%);  recovering 2 o/s, 8864KB/s; 1 near full osd(s)
   monmap e1: 3 mons at
{kvm-cs-sn-10i=192.168.241.110:6789/0,kvm-cs-sn-14i=192.168.241.114:6789/0,kvm-cs-sn-15i=192.168.241.115:6789/0},
election epoch 42, quorum 0,1,2
kvm-cs-sn-10i,kvm-cs-sn-14i,kvm-cs-sn-15i
   osdmap e512: 30 osds: 27 up, 27 in
    pgmap v1474651: 448 pgs: 425 active+clean, 1
active+recovering+remapped, 3 active+remapped+backfill_toofull, 11
active+degraded+backfill_toofull, 8
active+degraded+remapped+backfill_toofull; 3414 GB data, 6640 GB used,
7007 GB / 13647 GB avail; 0B/s rd, 2363B/s wr, 0op/s; 157614/1775814
degraded (8.876%);  recovering 2 o/s, 8864KB/s
   mdsmap e1: 0/0/1 up

Even after restarting the OSDs, it hangs at 8.876%.  Consequently,
many of our virts have crashed.

I'm hoping someone on this list can provide some suggestions.
Otherwise, I may have to blow this up.  Thanks!

--
\*..+.-
--Greg Chavez
+//..;};
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com