Re: OSD RAM usage values

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Tue, 28 Jul 2015 16:04:50 +0200

On Tue, Jul 28, 2015 at 12:07 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
> On Tue, Jul 28, 2015 at 11:00 AM, Kenneth Waegeman
> <kenneth.waegeman@xxxxxxxx> wrote:
>>
>>
>> On 07/17/2015 02:50 PM, Gregory Farnum wrote:
>>>
>>> On Fri, Jul 17, 2015 at 1:13 PM, Kenneth Waegeman
>>> <kenneth.waegeman@xxxxxxxx> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I've read in the documentation that OSDs use around 512MB on a healthy
>>>> cluster.(http://ceph.com/docs/master/start/hardware-recommendations/#ram)
>>>> Now, our OSD's are all using around 2GB of RAM memory while the cluster
>>>> is
>>>> healthy.
>>>>
>>>>
>>>>    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+
>>>> COMMAND
>>>> 29784 root      20   0 6081276 2.535g   4740 S   0.7  8.1   1346:55
>>>> ceph-osd
>>>> 32818 root      20   0 5417212 2.164g  24780 S  16.2  6.9   1238:55
>>>> ceph-osd
>>>> 25053 root      20   0 5386604 2.159g  27864 S   0.7  6.9   1192:08
>>>> ceph-osd
>>>> 33875 root      20   0 5345288 2.092g   3544 S   0.7  6.7   1188:53
>>>> ceph-osd
>>>> 30779 root      20   0 5474832 2.090g  28892 S   1.0  6.7   1142:29
>>>> ceph-osd
>>>> 22068 root      20   0 5191516 2.000g  28664 S   0.7  6.4  31:56.72
>>>> ceph-osd
>>>> 34932 root      20   0 5242656 1.994g   4536 S   0.3  6.4   1144:48
>>>> ceph-osd
>>>> 26883 root      20   0 5178164 1.938g   6164 S   0.3  6.2   1173:01
>>>> ceph-osd
>>>> 31796 root      20   0 5193308 1.916g  27000 S  16.2  6.1 923:14.87
>>>> ceph-osd
>>>> 25958 root      20   0 5193436 1.901g   2900 S   0.7  6.1   1039:53
>>>> ceph-osd
>>>> 27826 root      20   0 5225764 1.845g   5576 S   1.0  5.9   1031:15
>>>> ceph-osd
>>>> 36011 root      20   0 5111660 1.823g  20512 S  15.9  5.8   1093:01
>>>> ceph-osd
>>>> 19736 root      20   0 2134680 0.994g      0 S   0.3  3.2  46:13.47
>>>> ceph-osd
>>>>
>>>>
>>>>
>>>> [root@osd003 ~]# ceph status
>>>> 2015-07-17 14:03:13.865063 7f1fde5f0700 -1 WARNING: the following
>>>> dangerous
>>>> and experimental features are enabled: keyvaluestore
>>>> 2015-07-17 14:03:13.887087 7f1fde5f0700 -1 WARNING: the following
>>>> dangerous
>>>> and experimental features are enabled: keyvaluestore
>>>>      cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
>>>>       health HEALTH_OK
>>>>       monmap e1: 3 mons at
>>>>
>>>> {mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0}
>>>>              election epoch 58, quorum 0,1,2 mds01,mds02,mds03
>>>>       mdsmap e17218: 1/1/1 up {0=mds03=up:active}, 1 up:standby
>>>>       osdmap e25542: 258 osds: 258 up, 258 in
>>>>        pgmap v2460163: 4160 pgs, 4 pools, 228 TB data, 154 Mobjects
>>>>              270 TB used, 549 TB / 819 TB avail
>>>>                  4152 active+clean
>>>>                     8 active+clean+scrubbing+deep
>>>>
>>>>
>>>> We are using erasure code on most of our OSDs, so maybe that is a reason.
>>>> But also the cache-pool filestore OSDS on 200GB SSDs are using 2GB of
>>>> RAM.
>>>> Our erasure code pool (16*14 osds) have a pg_num of 2048; our cache pool
>>>> (2*14 OSDS) has a pg_num of 1024.
>>>>
>>>> Are these normal values for this configuration, and is the documentation
>>>> a
>>>> bit outdated, or should we look into something else?
>>>
>>>
>>> 2GB of RSS is larger than I would have expected, but not unreasonable.
>>> In particular I don't think we've gathered numbers on either EC pools
>>> or on the effects of the caching processes.
>>
>>
>> Which data is actually in memory of the OSDS?
>> Is this mostly cached data?
>> We are short on memory on these servers, can we have influence on this?
>
> Mmm, we've discussed this a few times on the mailing list. The CERN
> guys published a document on experimenting with a very large cluster
> and not enough RAM, but there's nothing I would really recommend
> changing for a production system, especially an EC one, if you aren't
> intimately familiar with what's going on.

In that CERN test the obvious large memory consumer was the osdmap
cache, which was so large because (a) the maps were getting quite
large (7200 OSDs creates a 4MB map, IIRC) and (b) so much osdmap churn
was leading each OSD to cache 500 of the maps. Once the cluster was
fully deployed and healthy, we could restart an OSD and it would then
only use ~300MB (because now the osdmap cache was ~empty).

Kenneth: does the memory usage shrink if you restart an osd? If so, it
could be a similar issue.

Cheers, Dan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com