Re: Hammer OSD memory usage very high

"Praveen Kumar G T (Cloud Platform)" <praveen.gt@xxxxxxxxxxxx> · Thu, 13 Oct 2016 14:51:36 +0530

Hi David,

I am Praveen, we also had a similar problem with hammer 0.94.2. We had the problem when we created a new cluster with erasure coding pool (10+5 config). 

Root cause:

The high memory usage in our case was because of pg logs. The number of pg logs are higher in case of erasure coding pool compared to replica pools. so in our case we started running out of memory when we created the new cluster with erasure coding pools

Solution:

Ceph provides configuration to control the number of pg log entries. You can try setting this value in your cluster and check your OSD memory usage. This will also improve the osd boot up time. Below are the config parameters and the values we use

  osd max pg log entries = 600
  osd min pg log entries = 200
  osd pg log trim min = 200

Other Information:

We dug around this problem for some time before figuring out the root cause. So we are fairly sure there are no memory leaks in ceph hammer 0.94.2 version. 

Regards,
Praveen

Date: Fri, 7 Oct 2016 16:04:03 +1100
From: David Burns <dburns@xxxxxxxxxxxxxx>
To: ceph-users@xxxxxxxx
Subject: [ceph-users] Hammer OSD memory usage very high
Message-ID: <C5D65C91-1ABF-4A7A-BAB5-B88785A0AD8C@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset=UTF-8

Hello all,

We have a small 160TB Ceph cluster used only as a test s3 storage repository for media content.

Problem
Since upgrading from Firefly to Hammer we are experiencing very high OSD memory use of 2-3 GB per TB of OSD storage - typical OSD memory 6-10GB.
We have had to increase swap space to bring the cluster to a basic functional state. Clearly this will significantly impact system performance and precludes starting all OSDs simultaneously.

Hardware
4 x storage nodes with 16 OSDs/node. OSD nodes are reasonable spec SMC storage servers with dual Xeon CPUs. Storage is 16 x 3TB SAS disks in each node.
Installed RAM is 72GB (2 nodes) & 80GB (2 nodes). (We note that the installed RAM is at least 50% higher than the Ceph recommended 1 GB RAM per TB of storage.)

Software
OSD node OS is CentOS 6.8 (with updates). One node has been updated to CentOS 7.2 - no change in memory usage was observed.

"ceph -v" -> ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90)
(all Ceph packages downloaded from download.ceph.com)

The cluster has achieved status HEALTH_OK so we don?t believe this relates to increased memory due to recovery.

History
Emperor 0.72.2 -> Firefly 0.80.10 -> Hammer 0.94.6 -> Hammer 0.94.7 -> Hammer 0.94.9

OSD per process memory is observed to increase substantially during load_pgs phase.

Use of "ceph tell 'osd.*' heap release? has minimal effect - there is no substantial memory in the heap or cache freelists.

More information can be found in bug #17228 (link http://tracker.ceph.com/issues/17228)

Any feedback or guidance to further understanding the high memory usage would be welcomed.

Thanks

David

--
FetchTV Pty Ltd, Level 5, 61 Lavender Street, Milsons Point, NSW 2061

<http://www.fetchtv.com.au/>

This email is sent by FetchTV Pty Ltd (ABN 36 130 669 500). The contents of
this communication may be
confidential, legally privileged and/or copyright material. If you are not
the intended recipient, any use,
disclosure or copying of this communication is expressed prohibited. If you
have received this email in error,
please notify the sender and delete it immediately.

<http://facebook.com/fetchtv>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com