Re: Huge memory usage spike in OSD on hammer/giant

Jan Schermer <jan@xxxxxxxxxxx> · Mon, 7 Sep 2015 13:02:38 +0200

Apart from bug causing this, this could be caused by failure of other OSDs (even temporary) that starts backfills.

1) something fails
2) some PGs move to this OSD
3) this OSD has to allocate memory for all the PGs
4) whatever fails gets back up
5) the memory is never released.

A similiar scenario is possible if for example someone confuses "ceph osd crush reweight" with "ceph osd reweight" (yes, this happened to me :-)).

Did you try just restarting the OSD before you upgraded it?

Jan

> On 07 Sep 2015, at 12:58, Mariusz Gronczewski <mariusz.gronczewski@xxxxxxxxxxxx> wrote:
> 
> Hi,
> 
> over a weekend (was on vacation so I didnt get exactly what happened)
> our OSDs started eating in excess of 6GB of RAM (well RSS), which was a
> problem considering that we had only 8GB of ram for 4 OSDs (about 700
> pgs per osd and about 70GB space used. So spam of coredumps and OOMs
> blocked the osds down to unusabiltity.
> 
> I then upgraded one of OSDs to hammer which made it a bit better (~2GB
> per osd) but still much higher usage than before.
> 
> any ideas what would be a reason for that ? logs are mostly full on
> OSDs trying to recover and timed out heartbeats
> 
> -- 
> Mariusz Gronczewski, Administrator
> 
> Efigence S. A.
> ul. Wołoska 9a, 02-583 Warszawa
> T: [+48] 22 380 13 13
> F: [+48] 22 380 13 14
> E: mariusz.gronczewski@xxxxxxxxxxxx
> <mailto:mariusz.gronczewski@xxxxxxxxxxxx>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com