Re: OSD memory leaks?

Dave Spano <dspano@xxxxxxxxxxxxxx> · Wed, 9 Jan 2013 11:10:59 -0500 (EST)

Yes, I'm using argonaut. 

I've got 38 heap files from yesterday. Currently, the OSD in question is using 91.2% of memory according to top, and staying there. I initially thought it would go until the OOM killer started killing processes, but I don't see anything funny in the system logs that indicate that. 

On the other hand, the ceph-osd process on osd.1 is using far less memory. 

osd.0
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                 
 9151 root      20   0 20.4g  14g 2548 S    1 91.2 517:58.71 ceph-osd 

osd.1

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                 
10785 root      20   0  673m 310m 5164 S    3  1.9 107:04.39 ceph-osd  

Here's what tcmalloc says when I run ceph osd tell 0 heap stats:
2013-01-09 11:09:36.778675 7f62aae23700  0 log [INF] : osd.0tcmalloc heap stats:------------------------------------------------
2013-01-09 11:09:36.779113 7f62aae23700  0 log [INF] : MALLOC:      210884768 (  201.1 MB) Bytes in use by application
2013-01-09 11:09:36.779348 7f62aae23700  0 log [INF] : MALLOC: +     89026560 (   84.9 MB) Bytes in page heap freelist
2013-01-09 11:09:36.779928 7f62aae23700  0 log [INF] : MALLOC: +      7926512 (    7.6 MB) Bytes in central cache freelist
2013-01-09 11:09:36.779951 7f62aae23700  0 log [INF] : MALLOC: +       144896 (    0.1 MB) Bytes in transfer cache freelist
2013-01-09 11:09:36.779972 7f62aae23700  0 log [INF] : MALLOC: +     11046512 (   10.5 MB) Bytes in thread cache freelists
2013-01-09 11:09:36.780013 7f62aae23700  0 log [INF] : MALLOC: +      5177344 (    4.9 MB) Bytes in malloc metadata
2013-01-09 11:09:36.780030 7f62aae23700  0 log [INF] : MALLOC:   ------------
2013-01-09 11:09:36.780056 7f62aae23700  0 log [INF] : MALLOC: =    324206592 (  309.2 MB) Actual memory used (physical + swap)
2013-01-09 11:09:36.780081 7f62aae23700  0 log [INF] : MALLOC: +    126177280 (  120.3 MB) Bytes released to OS (aka unmapped)
2013-01-09 11:09:36.780112 7f62aae23700  0 log [INF] : MALLOC:   ------------
2013-01-09 11:09:36.780127 7f62aae23700  0 log [INF] : MALLOC: =    450383872 (  429.5 MB) Virtual address space used
2013-01-09 11:09:36.780152 7f62aae23700  0 log [INF] : MALLOC:
2013-01-09 11:09:36.780168 7f62aae23700  0 log [INF] : MALLOC:          37492              Spans in use
2013-01-09 11:09:36.780330 7f62aae23700  0 log [INF] : MALLOC:             51              Thread heaps in use
2013-01-09 11:09:36.780359 7f62aae23700  0 log [INF] : MALLOC:           4096              Tcmalloc page size
2013-01-09 11:09:36.780384 7f62aae23700  0 log [INF] : ------------------------------------------------

Dave Spano 
Optogenics 
Systems Administrator 

----- Original Message ----- 

From: "Sébastien Han" <han.sebastien@xxxxxxxxx> 
To: "Samuel Just" <sam.just@xxxxxxxxxxx> 
Cc: "Dave Spano" <dspano@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
Sent: Wednesday, January 9, 2013 10:20:43 AM 
Subject: Re: OSD memory leaks? 

I guess he runs Argonaut as well. 

More suggestions about this problem? 

Thanks! 

-- 
Regards, 
Sébastien Han. 

On Mon, Jan 7, 2013 at 8:09 PM, Samuel Just <sam.just@xxxxxxxxxxx> wrote: 
> 
> Awesome! What version are you running (ceph-osd -v, include the hash)? 
> -Sam 
> 
> On Mon, Jan 7, 2013 at 11:03 AM, Dave Spano <dspano@xxxxxxxxxxxxxx> wrote: 
> > This failed the first time I sent it, so I'm resending in plain text. 
> > 
> > Dave Spano 
> > Optogenics 
> > Systems Administrator 
> > 
> > 
> > 
> > ----- Original Message ----- 
> > 
> > From: "Dave Spano" <dspano@xxxxxxxxxxxxxx> 
> > To: "Sébastien Han" <han.sebastien@xxxxxxxxx> 
> > Cc: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>, "Samuel Just" <sam.just@xxxxxxxxxxx> 
> > Sent: Monday, January 7, 2013 12:40:06 PM 
> > Subject: Re: OSD memory leaks? 
> > 
> > 
> > Sam, 
> > 
> > Attached are some heaps that I collected today. 001 and 003 are just after I started the profiler; 011 is the most recent. If you need more, or anything different let me know. Already the OSD in question is at 38% memory usage. As mentioned by Sèbastien, restarting ceph-osd keeps things going. 
> > 
> > Not sure if this is helpful information, but out of the two OSDs that I have running, the first one (osd.0) is the one that develops this problem the quickest. osd.1 does have the same issue, it just takes much longer. Do the monitors hit the first osd in the list first, when there's activity? 
> > 
> > 
> > Dave Spano 
> > Optogenics 
> > Systems Administrator 
> > 
> > 
> > ----- Original Message ----- 
> > 
> > From: "Sébastien Han" <han.sebastien@xxxxxxxxx> 
> > To: "Samuel Just" <sam.just@xxxxxxxxxxx> 
> > Cc: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
> > Sent: Friday, January 4, 2013 10:20:58 AM 
> > Subject: Re: OSD memory leaks? 
> > 
> > Hi Sam, 
> > 
> > Thanks for your answer and sorry the late reply. 
> > 
> > Unfortunately I can't get something out from the profiler, actually I 
> > do but I guess it doesn't show what is supposed to show... I will keep 
> > on trying this. Anyway yesterday I just thought that the problem might 
> > be due to some over usage of some OSDs. I was thinking that the 
> > distribution of the primary OSD might be uneven, this could have 
> > explained that some memory leaks are more important with some servers. 
> > At the end, the repartition seems even but while looking at the pg 
> > dump I found something interesting in the scrub column, timestamps 
> > from the last scrubbing operation matched with times showed on the 
> > graph. 
> > 
> > After this, I made some calculation, I compared the total number of 
> > scrubbing operation with the time range where memory leaks occurred. 
> > First of all check my setup: 
> > 
> > root@c2-ceph-01 ~ # ceph osd tree 
> > dumped osdmap tree epoch 859 
> > # id weight type name up/down reweight 
> > -1 12 pool default 
> > -3 12 rack lc2_rack33 
> > -2 3 host c2-ceph-01 
> > 0 1 osd.0 up 1 
> > 1 1 osd.1 up 1 
> > 2 1 osd.2 up 1 
> > -4 3 host c2-ceph-04 
> > 10 1 osd.10 up 1 
> > 11 1 osd.11 up 1 
> > 9 1 osd.9 up 1 
> > -5 3 host c2-ceph-02 
> > 3 1 osd.3 up 1 
> > 4 1 osd.4 up 1 
> > 5 1 osd.5 up 1 
> > -6 3 host c2-ceph-03 
> > 6 1 osd.6 up 1 
> > 7 1 osd.7 up 1 
> > 8 1 osd.8 up 1 
> > 
> > 
> > And there are the results: 
> > 
> > * Ceph node 1 which has the most important memory leak performed 1608 
> > in total and 1059 during the time range where memory leaks occured 
> > * Ceph node 2, 1168 in total and 776 during the time range where 
> > memory leaks occured 
> > * Ceph node 3, 940 in total and 94 during the time range where memory 
> > leaks occurred 
> > * Ceph node 4, 899 in total and 191 during the time range where 
> > memory leaks occurred 
> > 
> > I'm still not entirely sure that the scrub operation causes the leak 
> > but the only relevant relation that I found... 
> > 
> > Could it be that the scrubbing process doesn't release memory? Btw I 
> > was wondering, how ceph decides at what time it should run the 
> > scrubbing operation? I know that it's once a day and control by the 
> > following options 
> > 
> > OPTION(osd_scrub_min_interval, OPT_FLOAT, 300) 
> > OPTION(osd_scrub_max_interval, OPT_FLOAT, 60*60*24) 
> > 
> > But how ceph determined the time where the operation started, during 
> > cluster creation probably? 
> > 
> > I just checked the options that control OSD scrubbing and found that by default: 
> > 
> > OPTION(osd_max_scrubs, OPT_INT, 1) 
> > 
> > So that might explain why only one OSD uses a lot of memory. 
> > 
> > My dirty workaround at the moment is to performed a check of memory 
> > use by every OSD and restart it if it uses more than 25% of the total 
> > memory. Also note that on ceph 1, 3 and 4 it's always one OSD that 
> > uses a lot of memory, for ceph 2 only the mem usage is high but almost 
> > the same for all the OSD process. 
> > 
> > Thank you in advance. 
> > 
> > -- 
> > Regards, 
> > Sébastien Han. 
> > 
> > 
> > On Wed, Dec 19, 2012 at 10:43 PM, Samuel Just <sam.just@xxxxxxxxxxx> wrote: 
> >> 
> >> Sorry, it's been very busy. The next step would to try to get a heap 
> >> dump. You can start a heap profile on osd N by: 
> >> 
> >> ceph osd tell N heap start_profiler 
> >> 
> >> and you can get it to dump the collected profile using 
> >> 
> >> ceph osd tell N heap dump. 
> >> 
> >> The dumps should show up in the osd log directory. 
> >> 
> >> Assuming the heap profiler is working correctly, you can look at the 
> >> dump using pprof in google-perftools. 
> >> 
> >> On Wed, Dec 19, 2012 at 8:37 AM, Sébastien Han <han.sebastien@xxxxxxxxx> wrote: 
> >> > No more suggestions? :( 
> >> > -- 
> >> > Regards, 
> >> > Sébastien Han. 
> >> > 
> >> > 
> >> > On Tue, Dec 18, 2012 at 6:21 PM, Sébastien Han <han.sebastien@xxxxxxxxx> wrote: 
> >> >> Nothing terrific... 
> >> >> 
> >> >> Kernel logs from my clients are full of "libceph: osd4 
> >> >> 172.20.11.32:6801 socket closed" 
> >> >> 
> >> >> I saw this somewhere on the tracker. 
> >> >> 
> >> >> Does this harm? 
> >> >> 
> >> >> Thanks. 
> >> >> 
> >> >> -- 
> >> >> Regards, 
> >> >> Sébastien Han. 
> >> >> 
> >> >> 
> >> >> 
> >> >> On Mon, Dec 17, 2012 at 11:55 PM, Samuel Just <sam.just@xxxxxxxxxxx> wrote: 
> >> >>> 
> >> >>> What is the workload like? 
> >> >>> -Sam 
> >> >>> 
> >> >>> On Mon, Dec 17, 2012 at 2:41 PM, Sébastien Han <han.sebastien@xxxxxxxxx> wrote: 
> >> >>> > Hi, 
> >> >>> > 
> >> >>> > No, I don't see nothing abnormal in the network stats. I don't see 
> >> >>> > anything in the logs... :( 
> >> >>> > The weird thing is that one node over 4 seems to take way more memory 
> >> >>> > than the others... 
> >> >>> > 
> >> >>> > -- 
> >> >>> > Regards, 
> >> >>> > Sébastien Han. 
> >> >>> > 
> >> >>> > 
> >> >>> > On Mon, Dec 17, 2012 at 11:31 PM, Sébastien Han <han.sebastien@xxxxxxxxx> wrote: 
> >> >>> >> 
> >> >>> >> Hi, 
> >> >>> >> 
> >> >>> >> No, I don't see nothing abnormal in the network stats. I don't see anything in the logs... :( 
> >> >>> >> The weird thing is that one node over 4 seems to take way more memory than the others... 
> >> >>> >> 
> >> >>> >> -- 
> >> >>> >> Regards, 
> >> >>> >> Sébastien Han. 
> >> >>> >> 
> >> >>> >> 
> >> >>> >> 
> >> >>> >> On Mon, Dec 17, 2012 at 7:12 PM, Samuel Just <sam.just@xxxxxxxxxxx> wrote: 
> >> >>> >>> 
> >> >>> >>> Are you having network hiccups? There was a bug noticed recently that 
> >> >>> >>> could cause a memory leak if nodes are being marked up and down. 
> >> >>> >>> -Sam 
> >> >>> >>> 
> >> >>> >>> On Mon, Dec 17, 2012 at 12:28 AM, Sébastien Han <han.sebastien@xxxxxxxxx> wrote: 
> >> >>> >>> > Hi guys, 
> >> >>> >>> > 
> >> >>> >>> > Today looking at my graphs I noticed that one over 4 ceph nodes used a 
> >> >>> >>> > lot of memory. It keeps growing and growing. 
> >> >>> >>> > See the graph attached to this mail. 
> >> >>> >>> > I run 0.48.2 on Ubuntu 12.04. 
> >> >>> >>> > 
> >> >>> >>> > The other nodes also grow, but slowly than the first one. 
> >> >>> >>> > 
> >> >>> >>> > I'm not quite sure about the information that I have to provide. So 
> >> >>> >>> > let me know. The only thing I can say is that the load haven't 
> >> >>> >>> > increase that much this week. It seems to be consuming and not giving 
> >> >>> >>> > back the memory. 
> >> >>> >>> > 
> >> >>> >>> > Thank you in advance. 
> >> >>> >>> > 
> >> >>> >>> > -- 
> >> >>> >>> > Regards, 
> >> >>> >>> > Sébastien Han. 
> >> >>> >> 
> >> >>> >> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> the body of a message to majordomo@xxxxxxxxxxxxxxx 
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html