Re: osd_memory_target exceeding on Luminous OSD BlueStore

Olivier Bonvalet <ceph.list@xxxxxxxxx> · Tue, 09 Apr 2019 19:33:48 +0200

Good point, thanks !

By making memory pressure (by playing with vm.min_free_kbytes), memory
is freed by the kernel.

So I think I essentially need to update monitoring rules, to avoid
false positive.

Thanks, I continue to read your resources.

Le mardi 09 avril 2019 à 09:30 -0500, Mark Nelson a écrit :
> My understanding is that basically the kernel is either unable or 
> uninterested (maybe due to lack of memory pressure?) in reclaiming
> the 
> memory .  It's possible you might have better behavior if you set 
> /sys/kernel/mm/khugepaged/max_ptes_none to a low value (maybe 0) or 
> maybe disable transparent huge pages entirely.
> 
> 
> Some background:
> 
> https://github.com/gperftools/gperftools/issues/1073
> 
> https://blog.nelhage.com/post/transparent-hugepages/
> 
> https://www.kernel.org/doc/Documentation/vm/transhuge.txt
> 
> 
> Mark
> 
> 
> On 4/9/19 7:31 AM, Olivier Bonvalet wrote:
> > Well, Dan seems to be right :
> > 
> > _tune_cache_size
> >          target: 4294967296
> >            heap: 6514409472
> >        unmapped: 2267537408
> >          mapped: 4246872064
> > old cache_size: 2845396873
> > new cache size: 2845397085
> > 
> > 
> > So we have 6GB in heap, but "only" 4GB mapped.
> > 
> > But "ceph tell osd.* heap release" should had release that ?
> > 
> > 
> > Thanks,
> > 
> > Olivier
> > 
> > 
> > Le lundi 08 avril 2019 à 16:09 -0500, Mark Nelson a écrit :
> > > One of the difficulties with the osd_memory_target work is that
> > > we
> > > can't
> > > tune based on the RSS memory usage of the process. Ultimately
> > > it's up
> > > to
> > > the kernel to decide to reclaim memory and especially with
> > > transparent
> > > huge pages it's tough to judge what the kernel is going to do
> > > even
> > > if
> > > memory has been unmapped by the process.  Instead the autotuner
> > > looks
> > > at
> > > how much memory has been mapped and tries to balance the caches
> > > based
> > > on
> > > that.
> > > 
> > > 
> > > In addition to Dan's advice, you might also want to enable debug
> > > bluestore at level 5 and look for lines containing "target:" and
> > > "cache_size:".  These will tell you the current target, the
> > > mapped
> > > memory, unmapped memory, heap size, previous aggregate cache
> > > size,
> > > and
> > > new aggregate cache size.  The other line will give you a break
> > > down
> > > of
> > > how much memory was assigned to each of the bluestore caches and
> > > how
> > > much each case is using.  If there is a memory leak, the
> > > autotuner
> > > can
> > > only do so much.  At some point it will reduce the caches to fit
> > > within
> > > cache_min and leave it there.
> > > 
> > > 
> > > Mark
> > > 
> > > 
> > > On 4/8/19 5:18 AM, Dan van der Ster wrote:
> > > > Which OS are you using?
> > > > With CentOS we find that the heap is not always automatically
> > > > released. (You can check the heap freelist with `ceph tell
> > > > osd.0
> > > > heap
> > > > stats`).
> > > > As a workaround we run this hourly:
> > > > 
> > > > ceph tell mon.* heap release
> > > > ceph tell osd.* heap release
> > > > ceph tell mds.* heap release
> > > > 
> > > > -- Dan
> > > > 
> > > > On Sat, Apr 6, 2019 at 1:30 PM Olivier Bonvalet <
> > > > ceph.list@xxxxxxxxx> wrote:
> > > > > Hi,
> > > > > 
> > > > > on a Luminous 12.2.11 deploiement, my bluestore OSD exceed
> > > > > the
> > > > > osd_memory_target :
> > > > > 
> > > > > daevel-ob@ssdr712h:~$ ps auxw | grep ceph-osd
> > > > > ceph        3646 17.1 12.0 6828916 5893136 ?     Ssl  mars29
> > > > > 1903:42 /usr/bin/ceph-osd -f --cluster ceph --id 143 --
> > > > > setuser
> > > > > ceph --setgroup ceph
> > > > > ceph        3991 12.9 11.2 6342812 5485356 ?     Ssl  mars29
> > > > > 1443:41 /usr/bin/ceph-osd -f --cluster ceph --id 144 --
> > > > > setuser
> > > > > ceph --setgroup ceph
> > > > > ceph        4361 16.9 11.8 6718432 5783584 ?     Ssl  mars29
> > > > > 1889:41 /usr/bin/ceph-osd -f --cluster ceph --id 145 --
> > > > > setuser
> > > > > ceph --setgroup ceph
> > > > > ceph        4731 19.7 12.2 6949584 5982040 ?     Ssl  mars29
> > > > > 2198:47 /usr/bin/ceph-osd -f --cluster ceph --id 146 --
> > > > > setuser
> > > > > ceph --setgroup ceph
> > > > > ceph        5073 16.7 11.6 6639568 5701368 ?     Ssl  mars29
> > > > > 1866:05 /usr/bin/ceph-osd -f --cluster ceph --id 147 --
> > > > > setuser
> > > > > ceph --setgroup ceph
> > > > > ceph        5417 14.6 11.2 6386764 5519944 ?     Ssl  mars29
> > > > > 1634:30 /usr/bin/ceph-osd -f --cluster ceph --id 148 --
> > > > > setuser
> > > > > ceph --setgroup ceph
> > > > > ceph        5760 16.9 12.0 6806448 5879624 ?     Ssl  mars29
> > > > > 1882:42 /usr/bin/ceph-osd -f --cluster ceph --id 149 --
> > > > > setuser
> > > > > ceph --setgroup ceph
> > > > > ceph        6105 16.0 11.6 6576336 5694556 ?     Ssl  mars29
> > > > > 1782:52 /usr/bin/ceph-osd -f --cluster ceph --id 150 --
> > > > > setuser
> > > > > ceph --setgroup ceph
> > > > > 
> > > > > daevel-ob@ssdr712h:~$ free -m
> > > > >                 total        used        free      shared  bu
> > > > > ff/ca
> > > > > che   available
> > > > > Mem:          47771       45210        1643          17      
> > > > >    9
> > > > > 17       43556
> > > > > Swap:             0           0           0
> > > > > 
> > > > > # ceph daemon osd.147 config show | grep memory_target
> > > > >       "osd_memory_target": "4294967296",
> > > > > 
> > > > > 
> > > > > And there is no recovery / backfilling, the cluster is fine :
> > > > > 
> > > > >      $ ceph status
> > > > >        cluster:
> > > > >          id:     de035250-323d-4cf6-8c4b-cf0faf6296b1
> > > > >          health: HEALTH_OK
> > > > > 
> > > > >        services:
> > > > >          mon: 5 daemons, quorum
> > > > > tolriq,tsyne,olkas,lorunde,amphel
> > > > >          mgr: tsyne(active), standbys: olkas, tolriq,
> > > > > lorunde,
> > > > > amphel
> > > > >          osd: 120 osds: 116 up, 116 in
> > > > > 
> > > > >        data:
> > > > >          pools:   20 pools, 12736 pgs
> > > > >          objects: 15.29M objects, 31.1TiB
> > > > >          usage:   101TiB used, 75.3TiB / 177TiB avail
> > > > >          pgs:     12732 active+clean
> > > > >                   4     active+clean+scrubbing+deep
> > > > > 
> > > > >        io:
> > > > >          client:   72.3MiB/s rd, 26.8MiB/s wr, 2.30kop/s rd,
> > > > > 1.29kop/s wr
> > > > > 
> > > > > 
> > > > >      On an other host, in the same pool, I see also high
> > > > > memory
> > > > > usage :
> > > > > 
> > > > >      daevel-ob@ssdr712g:~$ ps auxw | grep ceph-osd
> > > > >      ceph        6287  6.6 10.6 6027388 5190032
> > > > > ?     Ssl  mars21
> > > > > 1511:07 /usr/bin/ceph-osd -f --cluster ceph --id 131 --
> > > > > setuser
> > > > > ceph --setgroup ceph
> > > > >      ceph        6759  7.3 11.2 6299140 5484412
> > > > > ?     Ssl  mars21
> > > > > 1665:22 /usr/bin/ceph-osd -f --cluster ceph --id 132 --
> > > > > setuser
> > > > > ceph --setgroup ceph
> > > > >      ceph        7114  7.0 11.7 6576168 5756236
> > > > > ?     Ssl  mars21
> > > > > 1612:09 /usr/bin/ceph-osd -f --cluster ceph --id 133 --
> > > > > setuser
> > > > > ceph --setgroup ceph
> > > > >      ceph        7467  7.4 11.1 6244668 5430512
> > > > > ?     Ssl  mars21
> > > > > 1704:06 /usr/bin/ceph-osd -f --cluster ceph --id 134 --
> > > > > setuser
> > > > > ceph --setgroup ceph
> > > > >      ceph        7821  7.7 11.1 6309456 5469376
> > > > > ?     Ssl  mars21
> > > > > 1754:35 /usr/bin/ceph-osd -f --cluster ceph --id 135 --
> > > > > setuser
> > > > > ceph --setgroup ceph
> > > > >      ceph        8174  6.9 11.6 6545224 5705412
> > > > > ?     Ssl  mars21
> > > > > 1590:31 /usr/bin/ceph-osd -f --cluster ceph --id 136 --
> > > > > setuser
> > > > > ceph --setgroup ceph
> > > > >      ceph        8746  6.6 11.1 6290004 5477204
> > > > > ?     Ssl  mars21
> > > > > 1511:11 /usr/bin/ceph-osd -f --cluster ceph --id 137 --
> > > > > setuser
> > > > > ceph --setgroup ceph
> > > > >      ceph        9100  7.7 11.6 6552080 5713560
> > > > > ?     Ssl  mars21
> > > > > 1757:22 /usr/bin/ceph-osd -f --cluster ceph --id 138 --
> > > > > setuser
> > > > > ceph --setgroup ceph
> > > > > 
> > > > >      But ! On a similar host, in a different pool, the
> > > > > problem is
> > > > > less visible :
> > > > > 
> > > > >      daevel-ob@ssdr712i:~$ ps auxw | grep ceph-osd
> > > > >      ceph        3617  2.8  9.9 5660308 4847444
> > > > > ?     Ssl  mars29
> > > > > 313:05 /usr/bin/ceph-osd -f --cluster ceph --id 151 --setuser
> > > > > ceph --setgroup ceph
> > > > >      ceph        3958  2.3  9.8 5661936 4834320
> > > > > ?     Ssl  mars29
> > > > > 256:55 /usr/bin/ceph-osd -f --cluster ceph --id 152 --setuser
> > > > > ceph --setgroup ceph
> > > > >      ceph        4299  2.3  9.8 5620616 4807248
> > > > > ?     Ssl  mars29
> > > > > 266:26 /usr/bin/ceph-osd -f --cluster ceph --id 153 --setuser
> > > > > ceph --setgroup ceph
> > > > >      ceph        4643  2.3  9.6 5527724 4713572
> > > > > ?     Ssl  mars29
> > > > > 262:50 /usr/bin/ceph-osd -f --cluster ceph --id 154 --setuser
> > > > > ceph --setgroup ceph
> > > > >      ceph        5016  2.2  9.7 5597504 4783412
> > > > > ?     Ssl  mars29
> > > > > 248:37 /usr/bin/ceph-osd -f --cluster ceph --id 155 --setuser
> > > > > ceph --setgroup ceph
> > > > >      ceph        5380  2.8  9.9 5700204 4886432
> > > > > ?     Ssl  mars29
> > > > > 321:05 /usr/bin/ceph-osd -f --cluster ceph --id 156 --setuser
> > > > > ceph --setgroup ceph
> > > > >      ceph        5724  3.1 10.1 5767456 4953484
> > > > > ?     Ssl  mars29
> > > > > 352:55 /usr/bin/ceph-osd -f --cluster ceph --id 157 --setuser
> > > > > ceph --setgroup ceph
> > > > >      ceph        6070  2.7  9.9 5683092 4868632
> > > > > ?     Ssl  mars29
> > > > > 309:10 /usr/bin/ceph-osd -f --cluster ceph --id 158 --setuser
> > > > > ceph --setgroup ceph
> > > > > 
> > > > > 
> > > > >      Is there some memory leak ? Or should I expect that
> > > > > osd_memory_target
> > > > >      (the default 4GB here) is not really followed, and so
> > > > > reduce
> > > > > it ?
> > > > > 
> > > > >      Thanks,
> > > > > 
> > > > > 
> > > > > _______________________________________________
> > > > > ceph-users mailing list
> > > > > ceph-users@xxxxxxxxxxxxxx
> > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > _______________________________________________
> > > > ceph-users mailing list
> > > > ceph-users@xxxxxxxxxxxxxx
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@xxxxxxxxxxxxxx
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com