Re: OSD RAM usage values

Gregory Farnum <greg@xxxxxxxxxxx> · Tue, 28 Jul 2015 11:07:52 +0100

On Tue, Jul 28, 2015 at 11:00 AM, Kenneth Waegeman
<kenneth.waegeman@xxxxxxxx> wrote:
>
>
> On 07/17/2015 02:50 PM, Gregory Farnum wrote:
>>
>> On Fri, Jul 17, 2015 at 1:13 PM, Kenneth Waegeman
>> <kenneth.waegeman@xxxxxxxx> wrote:
>>>
>>> Hi all,
>>>
>>> I've read in the documentation that OSDs use around 512MB on a healthy
>>> cluster.(http://ceph.com/docs/master/start/hardware-recommendations/#ram)
>>> Now, our OSD's are all using around 2GB of RAM memory while the cluster
>>> is
>>> healthy.
>>>
>>>
>>>    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+
>>> COMMAND
>>> 29784 root      20   0 6081276 2.535g   4740 S   0.7  8.1   1346:55
>>> ceph-osd
>>> 32818 root      20   0 5417212 2.164g  24780 S  16.2  6.9   1238:55
>>> ceph-osd
>>> 25053 root      20   0 5386604 2.159g  27864 S   0.7  6.9   1192:08
>>> ceph-osd
>>> 33875 root      20   0 5345288 2.092g   3544 S   0.7  6.7   1188:53
>>> ceph-osd
>>> 30779 root      20   0 5474832 2.090g  28892 S   1.0  6.7   1142:29
>>> ceph-osd
>>> 22068 root      20   0 5191516 2.000g  28664 S   0.7  6.4  31:56.72
>>> ceph-osd
>>> 34932 root      20   0 5242656 1.994g   4536 S   0.3  6.4   1144:48
>>> ceph-osd
>>> 26883 root      20   0 5178164 1.938g   6164 S   0.3  6.2   1173:01
>>> ceph-osd
>>> 31796 root      20   0 5193308 1.916g  27000 S  16.2  6.1 923:14.87
>>> ceph-osd
>>> 25958 root      20   0 5193436 1.901g   2900 S   0.7  6.1   1039:53
>>> ceph-osd
>>> 27826 root      20   0 5225764 1.845g   5576 S   1.0  5.9   1031:15
>>> ceph-osd
>>> 36011 root      20   0 5111660 1.823g  20512 S  15.9  5.8   1093:01
>>> ceph-osd
>>> 19736 root      20   0 2134680 0.994g      0 S   0.3  3.2  46:13.47
>>> ceph-osd
>>>
>>>
>>>
>>> [root@osd003 ~]# ceph status
>>> 2015-07-17 14:03:13.865063 7f1fde5f0700 -1 WARNING: the following
>>> dangerous
>>> and experimental features are enabled: keyvaluestore
>>> 2015-07-17 14:03:13.887087 7f1fde5f0700 -1 WARNING: the following
>>> dangerous
>>> and experimental features are enabled: keyvaluestore
>>>      cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
>>>       health HEALTH_OK
>>>       monmap e1: 3 mons at
>>>
>>> {mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0}
>>>              election epoch 58, quorum 0,1,2 mds01,mds02,mds03
>>>       mdsmap e17218: 1/1/1 up {0=mds03=up:active}, 1 up:standby
>>>       osdmap e25542: 258 osds: 258 up, 258 in
>>>        pgmap v2460163: 4160 pgs, 4 pools, 228 TB data, 154 Mobjects
>>>              270 TB used, 549 TB / 819 TB avail
>>>                  4152 active+clean
>>>                     8 active+clean+scrubbing+deep
>>>
>>>
>>> We are using erasure code on most of our OSDs, so maybe that is a reason.
>>> But also the cache-pool filestore OSDS on 200GB SSDs are using 2GB of
>>> RAM.
>>> Our erasure code pool (16*14 osds) have a pg_num of 2048; our cache pool
>>> (2*14 OSDS) has a pg_num of 1024.
>>>
>>> Are these normal values for this configuration, and is the documentation
>>> a
>>> bit outdated, or should we look into something else?
>>
>>
>> 2GB of RSS is larger than I would have expected, but not unreasonable.
>> In particular I don't think we've gathered numbers on either EC pools
>> or on the effects of the caching processes.
>
>
> Which data is actually in memory of the OSDS?
> Is this mostly cached data?
> We are short on memory on these servers, can we have influence on this?

Mmm, we've discussed this a few times on the mailing list. The CERN
guys published a document on experimenting with a very large cluster
and not enough RAM, but there's nothing I would really recommend
changing for a production system, especially an EC one, if you aren't
intimately familiar with what's going on.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com