On Tue, Jul 28, 2015 at 11:00 AM, Kenneth Waegeman <kenneth.waegeman@xxxxxxxx> wrote: > > > On 07/17/2015 02:50 PM, Gregory Farnum wrote: >> >> On Fri, Jul 17, 2015 at 1:13 PM, Kenneth Waegeman >> <kenneth.waegeman@xxxxxxxx> wrote: >>> >>> Hi all, >>> >>> I've read in the documentation that OSDs use around 512MB on a healthy >>> cluster.(http://ceph.com/docs/master/start/hardware-recommendations/#ram) >>> Now, our OSD's are all using around 2GB of RAM memory while the cluster >>> is >>> healthy. >>> >>> >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ >>> COMMAND >>> 29784 root 20 0 6081276 2.535g 4740 S 0.7 8.1 1346:55 >>> ceph-osd >>> 32818 root 20 0 5417212 2.164g 24780 S 16.2 6.9 1238:55 >>> ceph-osd >>> 25053 root 20 0 5386604 2.159g 27864 S 0.7 6.9 1192:08 >>> ceph-osd >>> 33875 root 20 0 5345288 2.092g 3544 S 0.7 6.7 1188:53 >>> ceph-osd >>> 30779 root 20 0 5474832 2.090g 28892 S 1.0 6.7 1142:29 >>> ceph-osd >>> 22068 root 20 0 5191516 2.000g 28664 S 0.7 6.4 31:56.72 >>> ceph-osd >>> 34932 root 20 0 5242656 1.994g 4536 S 0.3 6.4 1144:48 >>> ceph-osd >>> 26883 root 20 0 5178164 1.938g 6164 S 0.3 6.2 1173:01 >>> ceph-osd >>> 31796 root 20 0 5193308 1.916g 27000 S 16.2 6.1 923:14.87 >>> ceph-osd >>> 25958 root 20 0 5193436 1.901g 2900 S 0.7 6.1 1039:53 >>> ceph-osd >>> 27826 root 20 0 5225764 1.845g 5576 S 1.0 5.9 1031:15 >>> ceph-osd >>> 36011 root 20 0 5111660 1.823g 20512 S 15.9 5.8 1093:01 >>> ceph-osd >>> 19736 root 20 0 2134680 0.994g 0 S 0.3 3.2 46:13.47 >>> ceph-osd >>> >>> >>> >>> [root@osd003 ~]# ceph status >>> 2015-07-17 14:03:13.865063 7f1fde5f0700 -1 WARNING: the following >>> dangerous >>> and experimental features are enabled: keyvaluestore >>> 2015-07-17 14:03:13.887087 7f1fde5f0700 -1 WARNING: the following >>> dangerous >>> and experimental features are enabled: keyvaluestore >>> cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47 >>> health HEALTH_OK >>> monmap e1: 3 mons at >>> >>> {mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0} >>> election epoch 58, quorum 0,1,2 mds01,mds02,mds03 >>> mdsmap e17218: 1/1/1 up {0=mds03=up:active}, 1 up:standby >>> osdmap e25542: 258 osds: 258 up, 258 in >>> pgmap v2460163: 4160 pgs, 4 pools, 228 TB data, 154 Mobjects >>> 270 TB used, 549 TB / 819 TB avail >>> 4152 active+clean >>> 8 active+clean+scrubbing+deep >>> >>> >>> We are using erasure code on most of our OSDs, so maybe that is a reason. >>> But also the cache-pool filestore OSDS on 200GB SSDs are using 2GB of >>> RAM. >>> Our erasure code pool (16*14 osds) have a pg_num of 2048; our cache pool >>> (2*14 OSDS) has a pg_num of 1024. >>> >>> Are these normal values for this configuration, and is the documentation >>> a >>> bit outdated, or should we look into something else? >> >> >> 2GB of RSS is larger than I would have expected, but not unreasonable. >> In particular I don't think we've gathered numbers on either EC pools >> or on the effects of the caching processes. > > > Which data is actually in memory of the OSDS? > Is this mostly cached data? > We are short on memory on these servers, can we have influence on this? Mmm, we've discussed this a few times on the mailing list. The CERN guys published a document on experimenting with a very large cluster and not enough RAM, but there's nothing I would really recommend changing for a production system, especially an EC one, if you aren't intimately familiar with what's going on. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com