回复: 回复: Re: [luminous]OSD memory usage increase when writing^J a lot of data to cluster

"shadow_lin"<shadow_lin@xxxxxxx> · Wed, 1 Nov 2017 23:23:50 +0800

Hi Sage,

This is the mempool dump of my osd.1

ceph daemon osd.0 
dump_mempools
{
    "bloom_filter": 
{
        "items": 
0,
        "bytes": 
0
    },
    "bluestore_alloc": 
{
        "items": 
10301352,
        "bytes": 
10301352
    },
    "bluestore_cache_data": 
{
        "items": 
0,
        "bytes": 
0
    },
    "bluestore_cache_onode": 
{
        "items": 
386,
        "bytes": 
145136
    },
    "bluestore_cache_other": 
{
        "items": 
91914,
        "bytes": 
779970
    },
    "bluestore_fsck": 
{
        "items": 
0,
        "bytes": 
0
    },
    "bluestore_txc": 
{
        "items": 
16,
        "bytes": 
7040
    },

"bluestore_writing_deferred": {

"items": 11,
        "bytes": 
7600020
    },
    "bluestore_writing": 
{
        "items": 
0,
        "bytes": 
0
    },
    "bluefs": 
{
        "items": 
170,
        "bytes": 
5688
    },
    "buffer_anon": 
{
        "items": 
96726,
        "bytes": 
5685575
    },
    "buffer_meta": 
{
        "items": 
30,
        "bytes": 
1560
    },
    "osd": 
{
        "items": 
72,
        "bytes": 
554688
    },
    "osd_mapbl": 
{
        "items": 
0,
        "bytes": 
0
    },
    "osd_pglog": 
{
        "items": 
197946,
        "bytes": 
35743344
    },
    "osdmap": 
{
        "items": 
8007,
        "bytes": 
144024
    },
    "osdmap_mapping": 
{
        "items": 
0,
        "bytes": 
0
    },
    "pgmap": 
{
        "items": 
0,
        "bytes": 
0
    },
    "mds_co": 
{
        "items": 
0,
        "bytes": 
0
    },
    "unittest_1": 
{
        "items": 
0,
        "bytes": 
0
    },
    "unittest_2": 
{
        "items": 
0,
        "bytes": 
0
    },
    "total": 
{
        "items": 
10696630,
        "bytes": 
60968397
    }
}

And  the memory use by 
ps:ceph      8173 27.3 41.0 1509892 848768 
?      Ssl  Oct31 419:30 /usr/bin/ceph-osd 
--cluster=ceph -i 0 -f --setuser ceph --setgroup ceph

And ceph tell osd.0 heap stats
osd.0 tcmalloc 
heap 
stats:------------------------------------------------
MALLOC:      
398397808 (  379.9 MiB) Bytes in use by application
MALLOC: 
+    340647936 (  324.9 MiB) Bytes in page heap 
freelist
MALLOC: +     32574936 (   31.1 MiB) 
Bytes in central cache freelist
MALLOC: +     22581232 
(   21.5 MiB) Bytes in transfer cache freelist
MALLOC: 
+     51663048 (   49.3 MiB) Bytes in thread cache 
freelists
MALLOC: +      3152096 (    
3.0 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: 
=    849017056 (  809.7 MiB) Actual memory used (physical + 
swap)
MALLOC: +    128180224 (  122.2 MiB) Bytes released 
to OS (aka unmapped)
MALLOC:   ------------
MALLOC: 
=    977197280 (  931.9 MiB) Virtual address space 
used
MALLOC:
MALLOC:          
16765              
Spans in 
use
MALLOC:             
32              
Thread heaps in 
use
MALLOC:           
8192              
Tcmalloc page size
------------------------------------------------

I have run test for about 10hrs writing,so far no 
oom happened.The osd uses 9xxMB memory max and keep stable at around 
800-900MB.
I set blue store cache to 100MB by this 
config
bluestore_cache_size = 
104857600
       bluestore_cache_size_hdd = 
104857600
       bluestore_cache_size_ssd 
= 104857600
       bluestore_cache_kv_max 
= 103809024      

       I am not sure how to calculate if it 
is right because if i use bluestore_cache_size-512m it would be a negative 
value.
       Did you mean rocksdb would cost about 
512MB memory?

2017-11-01 

lin.yunfan

  发件人：Sage Weil <sage@xxxxxxxxxxxx>
  发送时间：2017-11-01 20:11
  主题：Re: 回复: Re: [ceph-users] [luminous]OSD memory usage 
  increase when writing^J a lot of data to cluster
  收件人："shadow_lin"<shadow_lin@xxxxxxx>
  抄送："ceph-users"<ceph-users@xxxxxxxxxxxxxx>

  On Wed, 1 Nov 2017, shadow_lin wrote: 
  > Hi Sage, 
  > We have tried compiled the latest ceph source code from github. 
  > The build is ceph version 12.2.1-249-g42172a4 
  > (42172a443183ffe6b36e85770e53fe678db293bf) luminous (stable). 
  > The memory problem seems better but the memory usage of osd is still keep 
  > increasing as more data are wrote into the rbd image and the memory usage 
  > won't drop after the write is stopped. 
  >        Could you specify from which commit the memeory bug is fixed? 

  f60a942023088cbba53a816e6ef846994921cab3 and the prior 2 commits. 

  If you look at 'cpeh daemon osd.nnn dump_mempools' you can see three 
  bluestore pools.  This is what bluestore is using to account for its usage  
  so it can know when to trim its cache.  Do those add up to the  
  bluestore_cache_size - 512m (for rocskdb) that you have configured? 

  sage 

  > Thanks 
  > 2017-11-01 
  >  
  > ____________________________________________________________________________ 
  > body {font-size:10.5pt; font-family:微软雅黑,serif} lin.yunfan 
  >  
  > ____________________________________________________________________________ 
  >       发件人：Sage Weil <sage@xxxxxxxxxxxx> 
  > 发送时间：2017-10-24 20:03 
  > 主题：Re: [ceph-users] [luminous]OSD memory usage increase when 
  > writing a lot of data to cluster 
  > 收件人："shadow_lin"<shadow_lin@xxxxxxx> 
  > 抄送："ceph-users"<ceph-users@xxxxxxxxxxxxxx> 
  >   
  > On Tue, 24 Oct 2017, shadow_lin wrote:  
  > > BLOCKQUOTE{margin-Top: 0px; margin-Bottom: 0px; margin-Left: 2em} body  
  > > {border-width:0;margin:0} img {border:0;margin:0;padding:0} Hi All,  
  > > The cluster has 24 osd with 24 8TB hdd.  
  > > Each osd server has 2GB ram and runs 2OSD with 2 8TBHDD. I know the memor 
  > y  
  > > is below the remmanded value, but this osd server is an ARM  server so I 
  ?? ?> can't do anything to add more ram.  
  > > I created a replicated(2 rep) pool and an 20TB image and mounted to the t 
  > est  
  > > server with xfs fs.   
  > >    
  > > I have set the ceph.conf to this(according to other related post suggeste 
  > d):  
  > > [osd]  
  > >         bluestore_cache_size = 104857600  
  > >         bluestore_cache_size_hdd = 104857600  
  > >         bluestore_cache_size_ssd = 104857600  
  > >         bluestore_cache_kv_max = 103809024  
  > >    
  > >  osd map cache size = 20  
  > >         osd map max advance = 10  
  > >         osd map share max epochs = 10  
  > >         osd pg epoch persisted max stale = 10  
  > > The bluestore cache setting did improve the situation,but if i try to wri 
  > te  
  > > 1TB data by dd command(dd if=/dev/zero of=test bs=1G count=1000)  to rbd 
  ?? ?the  
  > > osd will eventually be killed by oom killer.  
  > > If I only wirte like 100G  data once then everything is fine.  
  > >    
  > > Why does the osd memory usage keep increasing whle writing ?  
  > > Is there anything I can do to reduce the memory usage?  
  >   
  > There is a bluestore memory bug that was fixed just after 12.2.1 was   
  > released; it will be fixed in 12.2.2.  In the meantime, you can run   
  > consider running the latest luminous branch (not fully tested) from  
  > https://shaman.ceph.com/builds/ceph/luminous.  
  >   
  > sage  
  >  
  >  
  >  
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com