Re: osd_memory_target exceeding on Luminous OSD BlueStore

Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx> · Wed, 10 Apr 2019 09:53:43 +0200

Hi everyone,

So if the kernel is able to reclaim those pages, is there still a point 
in running the heap release on a regular basis?

Regards,
Frédéric.

Le 09/04/2019 à 19:33, Olivier Bonvalet a écrit :
Good point, thanks !

By making memory pressure (by playing with vm.min_free_kbytes), memory
is freed by the kernel.

So I think I essentially need to update monitoring rules, to avoid
false positive.

Thanks, I continue to read your resources.

Le mardi 09 avril 2019 à 09:30 -0500, Mark Nelson a écrit :
My understanding is that basically the kernel is either unable or
uninterested (maybe due to lack of memory pressure?) in reclaiming
the
memory .  It's possible you might have better behavior if you set
/sys/kernel/mm/khugepaged/max_ptes_none to a low value (maybe 0) or
maybe disable transparent huge pages entirely.

Some background:

https://github.com/gperftools/gperftools/issues/1073

https://blog.nelhage.com/post/transparent-hugepages/

https://www.kernel.org/doc/Documentation/vm/transhuge.txt

Mark

On 4/9/19 7:31 AM, Olivier Bonvalet wrote:
Well, Dan seems to be right :

_tune_cache_size
          target: 4294967296
            heap: 6514409472
        unmapped: 2267537408
          mapped: 4246872064
old cache_size: 2845396873
new cache size: 2845397085

So we have 6GB in heap, but "only" 4GB mapped.

But "ceph tell osd.* heap release" should had release that ?

Thanks,

Olivier

Le lundi 08 avril 2019 à 16:09 -0500, Mark Nelson a écrit :
One of the difficulties with the osd_memory_target work is that
we
can't
tune based on the RSS memory usage of the process. Ultimately
it's up
to
the kernel to decide to reclaim memory and especially with
transparent
huge pages it's tough to judge what the kernel is going to do
even
if
memory has been unmapped by the process.  Instead the autotuner
looks
at
how much memory has been mapped and tries to balance the caches
based
on
that.

In addition to Dan's advice, you might also want to enable debug
bluestore at level 5 and look for lines containing "target:" and
"cache_size:".  These will tell you the current target, the
mapped
memory, unmapped memory, heap size, previous aggregate cache
size,
and
new aggregate cache size.  The other line will give you a break
down
of
how much memory was assigned to each of the bluestore caches and
how
much each case is using.  If there is a memory leak, the
autotuner
can
only do so much.  At some point it will reduce the caches to fit
within
cache_min and leave it there.

Mark

On 4/8/19 5:18 AM, Dan van der Ster wrote:
Which OS are you using?
With CentOS we find that the heap is not always automatically
released. (You can check the heap freelist with `ceph tell
osd.0
heap
stats`).
As a workaround we run this hourly:

ceph tell mon.* heap release
ceph tell osd.* heap release
ceph tell mds.* heap release

-- Dan

On Sat, Apr 6, 2019 at 1:30 PM Olivier Bonvalet <
ceph.list@xxxxxxxxx> wrote:
Hi,

on a Luminous 12.2.11 deploiement, my bluestore OSD exceed
the
osd_memory_target :

daevel-ob@ssdr712h:~$ ps auxw | grep ceph-osd
ceph        3646 17.1 12.0 6828916 5893136 ?     Ssl  mars29
1903:42 /usr/bin/ceph-osd -f --cluster ceph --id 143 --
setuser
ceph --setgroup ceph
ceph        3991 12.9 11.2 6342812 5485356 ?     Ssl  mars29
1443:41 /usr/bin/ceph-osd -f --cluster ceph --id 144 --
setuser
ceph --setgroup ceph
ceph        4361 16.9 11.8 6718432 5783584 ?     Ssl  mars29
1889:41 /usr/bin/ceph-osd -f --cluster ceph --id 145 --
setuser
ceph --setgroup ceph
ceph        4731 19.7 12.2 6949584 5982040 ?     Ssl  mars29
2198:47 /usr/bin/ceph-osd -f --cluster ceph --id 146 --
setuser
ceph --setgroup ceph
ceph        5073 16.7 11.6 6639568 5701368 ?     Ssl  mars29
1866:05 /usr/bin/ceph-osd -f --cluster ceph --id 147 --
setuser
ceph --setgroup ceph
ceph        5417 14.6 11.2 6386764 5519944 ?     Ssl  mars29
1634:30 /usr/bin/ceph-osd -f --cluster ceph --id 148 --
setuser
ceph --setgroup ceph
ceph        5760 16.9 12.0 6806448 5879624 ?     Ssl  mars29
1882:42 /usr/bin/ceph-osd -f --cluster ceph --id 149 --
setuser
ceph --setgroup ceph
ceph        6105 16.0 11.6 6576336 5694556 ?     Ssl  mars29
1782:52 /usr/bin/ceph-osd -f --cluster ceph --id 150 --
setuser
ceph --setgroup ceph

daevel-ob@ssdr712h:~$ free -m
                 total        used        free      shared  bu
ff/ca
che   available
Mem:          47771       45210        1643          17
    9
17       43556
Swap:             0           0           0

# ceph daemon osd.147 config show | grep memory_target
       "osd_memory_target": "4294967296",

And there is no recovery / backfilling, the cluster is fine :

      $ ceph status
        cluster:
          id:     de035250-323d-4cf6-8c4b-cf0faf6296b1
          health: HEALTH_OK

        services:
          mon: 5 daemons, quorum
tolriq,tsyne,olkas,lorunde,amphel
          mgr: tsyne(active), standbys: olkas, tolriq,
lorunde,
amphel
          osd: 120 osds: 116 up, 116 in

        data:
          pools:   20 pools, 12736 pgs
          objects: 15.29M objects, 31.1TiB
          usage:   101TiB used, 75.3TiB / 177TiB avail
          pgs:     12732 active+clean
                   4     active+clean+scrubbing+deep

        io:
          client:   72.3MiB/s rd, 26.8MiB/s wr, 2.30kop/s rd,
1.29kop/s wr

      On an other host, in the same pool, I see also high
memory
usage :

      daevel-ob@ssdr712g:~$ ps auxw | grep ceph-osd
      ceph        6287  6.6 10.6 6027388 5190032
?     Ssl  mars21
1511:07 /usr/bin/ceph-osd -f --cluster ceph --id 131 --
setuser
ceph --setgroup ceph
      ceph        6759  7.3 11.2 6299140 5484412
?     Ssl  mars21
1665:22 /usr/bin/ceph-osd -f --cluster ceph --id 132 --
setuser
ceph --setgroup ceph
      ceph        7114  7.0 11.7 6576168 5756236
?     Ssl  mars21
1612:09 /usr/bin/ceph-osd -f --cluster ceph --id 133 --
setuser
ceph --setgroup ceph
      ceph        7467  7.4 11.1 6244668 5430512
?     Ssl  mars21
1704:06 /usr/bin/ceph-osd -f --cluster ceph --id 134 --
setuser
ceph --setgroup ceph
      ceph        7821  7.7 11.1 6309456 5469376
?     Ssl  mars21
1754:35 /usr/bin/ceph-osd -f --cluster ceph --id 135 --
setuser
ceph --setgroup ceph
      ceph        8174  6.9 11.6 6545224 5705412
?     Ssl  mars21
1590:31 /usr/bin/ceph-osd -f --cluster ceph --id 136 --
setuser
ceph --setgroup ceph
      ceph        8746  6.6 11.1 6290004 5477204
?     Ssl  mars21
1511:11 /usr/bin/ceph-osd -f --cluster ceph --id 137 --
setuser
ceph --setgroup ceph
      ceph        9100  7.7 11.6 6552080 5713560
?     Ssl  mars21
1757:22 /usr/bin/ceph-osd -f --cluster ceph --id 138 --
setuser
ceph --setgroup ceph

      But ! On a similar host, in a different pool, the
problem is
less visible :

      daevel-ob@ssdr712i:~$ ps auxw | grep ceph-osd
      ceph        3617  2.8  9.9 5660308 4847444
?     Ssl  mars29
313:05 /usr/bin/ceph-osd -f --cluster ceph --id 151 --setuser
ceph --setgroup ceph
      ceph        3958  2.3  9.8 5661936 4834320
?     Ssl  mars29
256:55 /usr/bin/ceph-osd -f --cluster ceph --id 152 --setuser
ceph --setgroup ceph
      ceph        4299  2.3  9.8 5620616 4807248
?     Ssl  mars29
266:26 /usr/bin/ceph-osd -f --cluster ceph --id 153 --setuser
ceph --setgroup ceph
      ceph        4643  2.3  9.6 5527724 4713572
?     Ssl  mars29
262:50 /usr/bin/ceph-osd -f --cluster ceph --id 154 --setuser
ceph --setgroup ceph
      ceph        5016  2.2  9.7 5597504 4783412
?     Ssl  mars29
248:37 /usr/bin/ceph-osd -f --cluster ceph --id 155 --setuser
ceph --setgroup ceph
      ceph        5380  2.8  9.9 5700204 4886432
?     Ssl  mars29
321:05 /usr/bin/ceph-osd -f --cluster ceph --id 156 --setuser
ceph --setgroup ceph
      ceph        5724  3.1 10.1 5767456 4953484
?     Ssl  mars29
352:55 /usr/bin/ceph-osd -f --cluster ceph --id 157 --setuser
ceph --setgroup ceph
      ceph        6070  2.7  9.9 5683092 4868632
?     Ssl  mars29
309:10 /usr/bin/ceph-osd -f --cluster ceph --id 158 --setuser
ceph --setgroup ceph

      Is there some memory leak ? Or should I expect that
osd_memory_target
      (the default 4GB here) is not really followed, and so
reduce
it ?

      Thanks,

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Attachment:
smime.p7s

Description: Signature cryptographique S/MIME
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com