Re: OSD and MON memory usage

Cláudio Martins <ctpm@xxxxxxxxxx> · Wed, 28 Nov 2012 21:30:32 +0000

On Wed, 28 Nov 2012 13:08:08 -0800 Samuel Just <sam.just@xxxxxxxxxxx> wrote:
> Can you post the output of ceph -s?

 'ceph -s' right now gives

   health HEALTH_WARN 923 pgs degraded; 8666 pgs down; 9606 pgs peering; 7 pgs recovering; 406 pgs recovery_wait; 3769 pgs stale; 9606 pgs stuck inactive; 3769 pgs stuck stale; 11052 pgs stuck unclean; recovery 121068/902868 degraded (13.409%); 4824/300956 unfound (1.603%); 2/18 in osds are down
   monmap e1: 1 mons at {0=193.136.128.202:6789/0}, election epoch 1, quorum 0 0
   osdmap e7669: 62 osds: 16 up, 18 in
    pgmap v47643: 12480 pgs: 35 active, 1223 active+clean, 129 stale+active, 321 active+recovery_wait, 198 stale+active+clean, 236 peering, 2 active+remapped, 2 stale+active+recovery_wait, 6126 down+peering, 249 active+degraded, 2 stale+active+recovering+degraded, 598 stale+peering, 7 active+clean+scrubbing, 29 active+recovery_wait+remapped, 2067 stale+down+peering, 618 stale+active+degraded, 52 active+recovery_wait+degraded, 61 remapped+peering, 365 down+remapped+peering, 2 stale+active+recovery_wait+degraded, 45 stale+remapped+peering, 108 stale+down+remapped+peering, 5 active+recovering; 1175 GB data, 1794 GB used, 25969 GB / 27764 GB avail; 121068/902868 degraded (13.409%); 4824/300956 unfound (1.603%)
   mdsmap e1: 0/0/1 up

 The cluster has been in this state since the last attempt to get it
going. I added about 100GB of swap on each machine to avoid the OOM
killer. Running like this resulted in the machines trashing wildly and
getting to ~2000 load avg, and after a while the osds started
dying/commited suicide, but *not* from OOM. Some of the few that remain
have bloated to around 1.9GB of mem usage.

 If you want, I can try to restart the whole thing tomorrow and collect
fresh log output from the dying OSDs, or any other action or debug info
that you might find useful.

Thanks!

Cláudio

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html