On Wed, 28 Nov 2012 13:08:08 -0800 Samuel Just <sam.just@xxxxxxxxxxx> wrote: > Can you post the output of ceph -s? 'ceph -s' right now gives health HEALTH_WARN 923 pgs degraded; 8666 pgs down; 9606 pgs peering; 7 pgs recovering; 406 pgs recovery_wait; 3769 pgs stale; 9606 pgs stuck inactive; 3769 pgs stuck stale; 11052 pgs stuck unclean; recovery 121068/902868 degraded (13.409%); 4824/300956 unfound (1.603%); 2/18 in osds are down monmap e1: 1 mons at {0=193.136.128.202:6789/0}, election epoch 1, quorum 0 0 osdmap e7669: 62 osds: 16 up, 18 in pgmap v47643: 12480 pgs: 35 active, 1223 active+clean, 129 stale+active, 321 active+recovery_wait, 198 stale+active+clean, 236 peering, 2 active+remapped, 2 stale+active+recovery_wait, 6126 down+peering, 249 active+degraded, 2 stale+active+recovering+degraded, 598 stale+peering, 7 active+clean+scrubbing, 29 active+recovery_wait+remapped, 2067 stale+down+peering, 618 stale+active+degraded, 52 active+recovery_wait+degraded, 61 remapped+peering, 365 down+remapped+peering, 2 stale+active+recovery_wait+degraded, 45 stale+remapped+peering, 108 stale+down+remapped+peering, 5 active+recovering; 1175 GB data, 1794 GB used, 25969 GB / 27764 GB avail; 121068/902868 degraded (13.409%); 4824/300956 unfound (1.603%) mdsmap e1: 0/0/1 up The cluster has been in this state since the last attempt to get it going. I added about 100GB of swap on each machine to avoid the OOM killer. Running like this resulted in the machines trashing wildly and getting to ~2000 load avg, and after a while the osds started dying/commited suicide, but *not* from OOM. Some of the few that remain have bloated to around 1.9GB of mem usage. If you want, I can try to restart the whole thing tomorrow and collect fresh log output from the dying OSDs, or any other action or debug info that you might find useful. Thanks! Cláudio -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html