On Wed, 09 Sep 2015 08:59:53 -0500, Chad William Seys <cwseys@xxxxxxxxxxxxxxxx> wrote: > > > Going from 2GB to 8GB is not normal, although some slight bloating is > > expected. > > If I recall correctly, Mariusz's cluster had a period of flapping OSDs? NIC got packet loss under traffic which caused heartbeats to periodically fail, which caused more traffic, which caused more failures. So basically worst possible scenario where most of PGs needed to be recovered. Lets just say that after getting all nodes up there was no single pg in active+clean state. > I experienced a a similar situation using hammer. My OSDs went from 10GB in > RAM in a Healthy state to 24GB RAM + 10GB swap in a recovering state. I also > could not re-add a node b/c every time I tried OOM killer would kill an OSD > daemon somewhere before the cluster could become healthy again. > > Therefore I propose we begin expecting bloating under these circumstances. :) > > > In your case it just got much worse than usual for reasons yet > > unknown. > > Not really unknown: B/c 'ceph tell osd.* heap release' freed RAM for Mariusz, > I think we know the reason for so much RAM use is b/c of tcmalloc not freeing > unused memory. Right? note that I've only did it after most of pg were recovered > Here is a related "urgent" and "won't fix" bug to which applies > http://tracker.ceph.com/issues/12681 . Sage suggests making the heap release > command a cron job . :) > > Have fun! > Chad. -- Mariusz Gronczewski, Administrator Efigence S. A. ul. Wołoska 9a, 02-583 Warszawa T: [+48] 22 380 13 13 F: [+48] 22 380 13 14 E: mariusz.gronczewski@xxxxxxxxxxxx <mailto:mariusz.gronczewski@xxxxxxxxxxxx>
Attachment:
pgp1nfYnStoL1.pgp
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com