Re: ceph-mon memory issue jewel 10.2.5 kernel 4.4

Jim Kilborn <jim@xxxxxxxxxxxx> · Thu, 9 Feb 2017 19:06:12 +0000

Graham,

I don’t think this is the issue I’m seeing. I’m running Centos on kernel 4.4.24-1. My processes aren’t dying.

I have two clusters with 3 mons in each cluster. Over the last 3 months that the clusters have been running, this is only happened on two nodes, and only once per node.

If I check the other nodes (or any nodes at this point), I see zero swap used, as in the example below.

[jkilborn@darkjedi-ceph02 ~]$ free -h

              total        used        free      shared  buff/cache   available

Mem:           125G         10G         85G        129M         28G        108G

Swap:          2.0G          0B        2.0G

These mon nodes are also running 8 osds each with ssd journals.

We have very little load at this point. Even when the ceph-mon process eats all the swap, it still shows free memory, and never goes offline.

              total        used        free      shared  buff/cache   available

Mem:      131783876    67618000    13383516       53868    50782360    61599096>

Swap:       2097148     2097092          56

Seems like a ceph-mon bug/leak to me.

Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10

From: Graham Allan<mailto:gta@xxxxxxx>
Sent: Thursday, February 9, 2017 11:24 AM
To: ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx>
Subject: Re:  ceph-mon memory issue jewel 10.2.5 kernel 4.4

I've been trying to figure out the same thing recently - I had the same
issues as others with jewel 10.2.3 (?) but for my current problem I
don't think it's a ceph issue.

Specifically ever since our last maintenance day, some of our OSD nodes
having been suffering OSDs killed by OOM killer despite having enough
memory.

I looked for ages at the discussions about reducing the map cache size
but it just didn't seem a likely cause.

It looks like a kernel bug. Here for ubuntu:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1655842

I was seeing this OOM issue on kernels 4.4.0.59 and 4.4.0.62. It sounds
like downgrading into 4.4.0.57 should resolve the issue, and 4.4.0.63
out shortly should also fix it.

Our unaffected machines in the cluster are running a different release
and kernel (though same version of ceph).

Haven't actually tested this yet, just found the reference in the last
hour... could this also be the problem you are seeing?

Graham

On 2/8/2017 6:58 PM, Andrei Mikhailovsky wrote:
> +1
>
> Ever since upgrading to 10.2.x I have been seeing a lot of issues with our ceph cluster. I have been seeing osds down, osd servers running out of memory and killing all ceph-osd processes. Again, 10.2.5 on 4.4.x kernel.
>
> It seems what with every release there are more and more problems with ceph (((, which is a shame.
>
> Andrei
>
> ----- Original Message -----
>> From: "Jim Kilborn" <jim@xxxxxxxxxxxx>
>> To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
>> Sent: Wednesday, 8 February, 2017 19:45:58
>> Subject:  ceph-mon memory issue jewel 10.2.5 kernel  4.4
>
>> I have had two ceph monitor nodes generate swap space alerts this week.
>> Looking at the memory, I see ceph-mon using a lot of memory and most of the swap
>> space. My ceph nodes have 128GB mem, with 2GB swap  (I know the memory/swap
>> ratio is odd)
>>
>> When I get the alert, I see the following
>>
>>
>> root@empire-ceph02 ~]# free
>>
>>              total        used        free      shared  buff/cache   available
>>
>> Mem:      131783876    67618000    13383516       53868    50782360    61599096
>>
>> Swap:       2097148     2097092          56
>>
>>
>>
>> root@empire-ceph02 ~]# ps -aux | egrep 'ceph-mon|MEM'
>>
>> USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
>>
>> ceph     174239  0.3 45.8 62812848 60405112 ?   Ssl   2016 269:08
>> /usr/bin/ceph-mon -f --cluster ceph --id empire-ceph02 --setuser ceph
>> --setgroup ceph
>>
>>
>> In the ceph-mon log, I see the following:
>>
>> Feb  8 09:31:21 empire-ceph02 ceph-mon: 2017-02-08 09:31:21.211268 7f414d974700
>> -1 lsb_release_parse - failed to call lsb_release binary with error: (12)
>> Cannot allocate memory
>> Feb  8 09:31:24 empire-ceph02 ceph-osd: 2017-02-08 09:31:24.012856 7f3dcfe94700
>> -1 osd.8 344 heartbeat_check: no reply from 0x563e4214f090 osd.1 since back
>> 2017-02-08 09:31:03.778901 front 2017-02-08 09:31:03.778901
>> (cutoff 2017-02-08 09:31:04.012854)
>> Feb  8 09:31:24 empire-ceph02 ceph-osd: 2017-02-08 09:31:24.012900 7f3dcfe94700
>> -1 osd.8 344 heartbeat_check: no reply from 0x563e4214da10 osd.3 since back
>> 2017-02-08 09:31:03.778901 front 2017-02-08 09:31:03.778901
>> (cutoff 2017-02-08 09:31:04.012854)
>> Feb  8 09:31:24 empire-ceph02 ceph-osd: 2017-02-08 09:31:24.012915 7f3dcfe94700
>> -1 osd.8 344 heartbeat_check: no reply from 0x563e4214d410 osd.5 since back
>> 2017-02-08 09:31:03.778901 front 2017-02-08 09:31:03.778901
>> (cutoff 2017-02-08 09:31:04.012854)
>> Feb  8 09:31:24 empire-ceph02 ceph-osd: 2017-02-08 09:31:24.012927 7f3dcfe94700
>> -1 osd.8 344 heartbeat_check: no reply from 0x563e4214e490 osd.6 since back
>> 2017-02-08 09:31:03.778901 front 2017-02-08 09:31:03.778901
>> (cutoff 2017-02-08 09:31:04.012854)
>> Feb  8 09:31:24 empire-ceph02 ceph-osd: 2017-02-08 09:31:24.012934 7f3dcfe94700
>> -1 osd.8 344 heartbeat_check: no reply from 0x563e42149a10 osd.7 since back
>> 2017-02-08 09:31:03.778901 front 2017-02-08 09:31:03.778901
>> (cutoff 2017-02-08 09:31:04.012854)
>> Feb  8 09:31:25 empire-ceph02 ceph-osd: 2017-02-08 09:31:25.013038 7f3dcfe94700
>> -1 osd.8 345 heartbeat_check: no reply from 0x563e4214f090 osd.1 since back
>> 2017-02-08 09:31:03.778901 front 2017-02-08 09:31:03.778901
>> (cutoff 2017-02-08 09:31:05.013020)
>>
>>
>> Is this a setting issue? Or Maybe a bug?
>> When I look at the other ceph-mon processes on other nodes, they aren’t using
>> any swap, and only about 500MB of memory.
>>
>> When I restart ceph-mds on the server that shows the issue, the swap frees up,
>> and the memory for the new ceph-mon is 500MB again.
>>
>> Any ideas would be appreciated.
>>
>>
>> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

--
Graham Allan
Minnesota Supercomputing Institute - gta@xxxxxxx
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com