ceph-mon memory issue jewel 10.2.5 kernel 4.4

Jim Kilborn <jim@xxxxxxxxxxxx> · Wed, 8 Feb 2017 19:45:58 +0000

I have had two ceph monitor nodes generate swap space alerts this week.
Looking at the memory, I see ceph-mon using a lot of memory and most of the swap space. My ceph nodes have 128GB mem, with 2GB swap  (I know the memory/swap ratio is odd)

When I get the alert, I see the following

root@empire-ceph02 ~]# free

              total        used        free      shared  buff/cache   available

Mem:      131783876    67618000    13383516       53868    50782360    61599096

Swap:       2097148     2097092          56

root@empire-ceph02 ~]# ps -aux | egrep 'ceph-mon|MEM'

USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND

ceph     174239  0.3 45.8 62812848 60405112 ?   Ssl   2016 269:08 /usr/bin/ceph-mon -f --cluster ceph --id empire-ceph02 --setuser ceph --setgroup ceph

In the ceph-mon log, I see the following:

Feb  8 09:31:21 empire-ceph02 ceph-mon: 2017-02-08 09:31:21.211268 7f414d974700 -1 lsb_release_parse - failed to call lsb_release binary with error: (12) Cannot allocate memory
Feb  8 09:31:24 empire-ceph02 ceph-osd: 2017-02-08 09:31:24.012856 7f3dcfe94700 -1 osd.8 344 heartbeat_check: no reply from 0x563e4214f090 osd.1 since back 2017-02-08 09:31:03.778901 front 2017-02-08 09:31:03.778901
(cutoff 2017-02-08 09:31:04.012854)
Feb  8 09:31:24 empire-ceph02 ceph-osd: 2017-02-08 09:31:24.012900 7f3dcfe94700 -1 osd.8 344 heartbeat_check: no reply from 0x563e4214da10 osd.3 since back 2017-02-08 09:31:03.778901 front 2017-02-08 09:31:03.778901
(cutoff 2017-02-08 09:31:04.012854)
Feb  8 09:31:24 empire-ceph02 ceph-osd: 2017-02-08 09:31:24.012915 7f3dcfe94700 -1 osd.8 344 heartbeat_check: no reply from 0x563e4214d410 osd.5 since back 2017-02-08 09:31:03.778901 front 2017-02-08 09:31:03.778901
(cutoff 2017-02-08 09:31:04.012854)
Feb  8 09:31:24 empire-ceph02 ceph-osd: 2017-02-08 09:31:24.012927 7f3dcfe94700 -1 osd.8 344 heartbeat_check: no reply from 0x563e4214e490 osd.6 since back 2017-02-08 09:31:03.778901 front 2017-02-08 09:31:03.778901
(cutoff 2017-02-08 09:31:04.012854)
Feb  8 09:31:24 empire-ceph02 ceph-osd: 2017-02-08 09:31:24.012934 7f3dcfe94700 -1 osd.8 344 heartbeat_check: no reply from 0x563e42149a10 osd.7 since back 2017-02-08 09:31:03.778901 front 2017-02-08 09:31:03.778901
(cutoff 2017-02-08 09:31:04.012854)
Feb  8 09:31:25 empire-ceph02 ceph-osd: 2017-02-08 09:31:25.013038 7f3dcfe94700 -1 osd.8 345 heartbeat_check: no reply from 0x563e4214f090 osd.1 since back 2017-02-08 09:31:03.778901 front 2017-02-08 09:31:03.778901
(cutoff 2017-02-08 09:31:05.013020)

Is this a setting issue? Or Maybe a bug?
When I look at the other ceph-mon processes on other nodes, they aren’t using any swap, and only about 500MB of memory.

When I restart ceph-mds on the server that shows the issue, the swap frees up, and the memory for the new ceph-mon is 500MB again.

Any ideas would be appreciated.

Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com