Hi everyone,So if the kernel is able to reclaim those pages, is there still a point in running the heap release on a regular basis?
Regards, Frédéric. Le 09/04/2019 à 19:33, Olivier Bonvalet a écrit :
Good point, thanks ! By making memory pressure (by playing with vm.min_free_kbytes), memory is freed by the kernel. So I think I essentially need to update monitoring rules, to avoid false positive. Thanks, I continue to read your resources. Le mardi 09 avril 2019 à 09:30 -0500, Mark Nelson a écrit :My understanding is that basically the kernel is either unable or uninterested (maybe due to lack of memory pressure?) in reclaiming the memory . It's possible you might have better behavior if you set /sys/kernel/mm/khugepaged/max_ptes_none to a low value (maybe 0) or maybe disable transparent huge pages entirely. Some background: https://github.com/gperftools/gperftools/issues/1073 https://blog.nelhage.com/post/transparent-hugepages/ https://www.kernel.org/doc/Documentation/vm/transhuge.txt Mark On 4/9/19 7:31 AM, Olivier Bonvalet wrote:Well, Dan seems to be right : _tune_cache_size target: 4294967296 heap: 6514409472 unmapped: 2267537408 mapped: 4246872064 old cache_size: 2845396873 new cache size: 2845397085 So we have 6GB in heap, but "only" 4GB mapped. But "ceph tell osd.* heap release" should had release that ? Thanks, Olivier Le lundi 08 avril 2019 à 16:09 -0500, Mark Nelson a écrit :One of the difficulties with the osd_memory_target work is that we can't tune based on the RSS memory usage of the process. Ultimately it's up to the kernel to decide to reclaim memory and especially with transparent huge pages it's tough to judge what the kernel is going to do even if memory has been unmapped by the process. Instead the autotuner looks at how much memory has been mapped and tries to balance the caches based on that. In addition to Dan's advice, you might also want to enable debug bluestore at level 5 and look for lines containing "target:" and "cache_size:". These will tell you the current target, the mapped memory, unmapped memory, heap size, previous aggregate cache size, and new aggregate cache size. The other line will give you a break down of how much memory was assigned to each of the bluestore caches and how much each case is using. If there is a memory leak, the autotuner can only do so much. At some point it will reduce the caches to fit within cache_min and leave it there. Mark On 4/8/19 5:18 AM, Dan van der Ster wrote:Which OS are you using? With CentOS we find that the heap is not always automatically released. (You can check the heap freelist with `ceph tell osd.0 heap stats`). As a workaround we run this hourly: ceph tell mon.* heap release ceph tell osd.* heap release ceph tell mds.* heap release -- Dan On Sat, Apr 6, 2019 at 1:30 PM Olivier Bonvalet < ceph.list@xxxxxxxxx> wrote:Hi, on a Luminous 12.2.11 deploiement, my bluestore OSD exceed the osd_memory_target : daevel-ob@ssdr712h:~$ ps auxw | grep ceph-osd ceph 3646 17.1 12.0 6828916 5893136 ? Ssl mars29 1903:42 /usr/bin/ceph-osd -f --cluster ceph --id 143 -- setuser ceph --setgroup ceph ceph 3991 12.9 11.2 6342812 5485356 ? Ssl mars29 1443:41 /usr/bin/ceph-osd -f --cluster ceph --id 144 -- setuser ceph --setgroup ceph ceph 4361 16.9 11.8 6718432 5783584 ? Ssl mars29 1889:41 /usr/bin/ceph-osd -f --cluster ceph --id 145 -- setuser ceph --setgroup ceph ceph 4731 19.7 12.2 6949584 5982040 ? Ssl mars29 2198:47 /usr/bin/ceph-osd -f --cluster ceph --id 146 -- setuser ceph --setgroup ceph ceph 5073 16.7 11.6 6639568 5701368 ? Ssl mars29 1866:05 /usr/bin/ceph-osd -f --cluster ceph --id 147 -- setuser ceph --setgroup ceph ceph 5417 14.6 11.2 6386764 5519944 ? Ssl mars29 1634:30 /usr/bin/ceph-osd -f --cluster ceph --id 148 -- setuser ceph --setgroup ceph ceph 5760 16.9 12.0 6806448 5879624 ? Ssl mars29 1882:42 /usr/bin/ceph-osd -f --cluster ceph --id 149 -- setuser ceph --setgroup ceph ceph 6105 16.0 11.6 6576336 5694556 ? Ssl mars29 1782:52 /usr/bin/ceph-osd -f --cluster ceph --id 150 -- setuser ceph --setgroup ceph daevel-ob@ssdr712h:~$ free -m total used free shared bu ff/ca che available Mem: 47771 45210 1643 17 9 17 43556 Swap: 0 0 0 # ceph daemon osd.147 config show | grep memory_target "osd_memory_target": "4294967296", And there is no recovery / backfilling, the cluster is fine : $ ceph status cluster: id: de035250-323d-4cf6-8c4b-cf0faf6296b1 health: HEALTH_OK services: mon: 5 daemons, quorum tolriq,tsyne,olkas,lorunde,amphel mgr: tsyne(active), standbys: olkas, tolriq, lorunde, amphel osd: 120 osds: 116 up, 116 in data: pools: 20 pools, 12736 pgs objects: 15.29M objects, 31.1TiB usage: 101TiB used, 75.3TiB / 177TiB avail pgs: 12732 active+clean 4 active+clean+scrubbing+deep io: client: 72.3MiB/s rd, 26.8MiB/s wr, 2.30kop/s rd, 1.29kop/s wr On an other host, in the same pool, I see also high memory usage : daevel-ob@ssdr712g:~$ ps auxw | grep ceph-osd ceph 6287 6.6 10.6 6027388 5190032 ? Ssl mars21 1511:07 /usr/bin/ceph-osd -f --cluster ceph --id 131 -- setuser ceph --setgroup ceph ceph 6759 7.3 11.2 6299140 5484412 ? Ssl mars21 1665:22 /usr/bin/ceph-osd -f --cluster ceph --id 132 -- setuser ceph --setgroup ceph ceph 7114 7.0 11.7 6576168 5756236 ? Ssl mars21 1612:09 /usr/bin/ceph-osd -f --cluster ceph --id 133 -- setuser ceph --setgroup ceph ceph 7467 7.4 11.1 6244668 5430512 ? Ssl mars21 1704:06 /usr/bin/ceph-osd -f --cluster ceph --id 134 -- setuser ceph --setgroup ceph ceph 7821 7.7 11.1 6309456 5469376 ? Ssl mars21 1754:35 /usr/bin/ceph-osd -f --cluster ceph --id 135 -- setuser ceph --setgroup ceph ceph 8174 6.9 11.6 6545224 5705412 ? Ssl mars21 1590:31 /usr/bin/ceph-osd -f --cluster ceph --id 136 -- setuser ceph --setgroup ceph ceph 8746 6.6 11.1 6290004 5477204 ? Ssl mars21 1511:11 /usr/bin/ceph-osd -f --cluster ceph --id 137 -- setuser ceph --setgroup ceph ceph 9100 7.7 11.6 6552080 5713560 ? Ssl mars21 1757:22 /usr/bin/ceph-osd -f --cluster ceph --id 138 -- setuser ceph --setgroup ceph But ! On a similar host, in a different pool, the problem is less visible : daevel-ob@ssdr712i:~$ ps auxw | grep ceph-osd ceph 3617 2.8 9.9 5660308 4847444 ? Ssl mars29 313:05 /usr/bin/ceph-osd -f --cluster ceph --id 151 --setuser ceph --setgroup ceph ceph 3958 2.3 9.8 5661936 4834320 ? Ssl mars29 256:55 /usr/bin/ceph-osd -f --cluster ceph --id 152 --setuser ceph --setgroup ceph ceph 4299 2.3 9.8 5620616 4807248 ? Ssl mars29 266:26 /usr/bin/ceph-osd -f --cluster ceph --id 153 --setuser ceph --setgroup ceph ceph 4643 2.3 9.6 5527724 4713572 ? Ssl mars29 262:50 /usr/bin/ceph-osd -f --cluster ceph --id 154 --setuser ceph --setgroup ceph ceph 5016 2.2 9.7 5597504 4783412 ? Ssl mars29 248:37 /usr/bin/ceph-osd -f --cluster ceph --id 155 --setuser ceph --setgroup ceph ceph 5380 2.8 9.9 5700204 4886432 ? Ssl mars29 321:05 /usr/bin/ceph-osd -f --cluster ceph --id 156 --setuser ceph --setgroup ceph ceph 5724 3.1 10.1 5767456 4953484 ? Ssl mars29 352:55 /usr/bin/ceph-osd -f --cluster ceph --id 157 --setuser ceph --setgroup ceph ceph 6070 2.7 9.9 5683092 4868632 ? Ssl mars29 309:10 /usr/bin/ceph-osd -f --cluster ceph --id 158 --setuser ceph --setgroup ceph Is there some memory leak ? Or should I expect that osd_memory_target (the default 4GB here) is not really followed, and so reduce it ? Thanks, _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Attachment:
smime.p7s
Description: Signature cryptographique S/MIME
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com