I have had two ceph monitor nodes generate swap space alerts this week. Looking at the memory, I see ceph-mon using a lot of memory and most of the swap space. My ceph nodes have 128GB mem, with 2GB swap (I know the memory/swap ratio is odd) When I get the alert, I see the following root@empire-ceph02 ~]# free total used free shared buff/cache available Mem: 131783876 67618000 13383516 53868 50782360 61599096 Swap: 2097148 2097092 56 root@empire-ceph02 ~]# ps -aux | egrep 'ceph-mon|MEM' USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND ceph 174239 0.3 45.8 62812848 60405112 ? Ssl 2016 269:08 /usr/bin/ceph-mon -f --cluster ceph --id empire-ceph02 --setuser ceph --setgroup ceph In the ceph-mon log, I see the following: Feb 8 09:31:21 empire-ceph02 ceph-mon: 2017-02-08 09:31:21.211268 7f414d974700 -1 lsb_release_parse - failed to call lsb_release binary with error: (12) Cannot allocate memory Feb 8 09:31:24 empire-ceph02 ceph-osd: 2017-02-08 09:31:24.012856 7f3dcfe94700 -1 osd.8 344 heartbeat_check: no reply from 0x563e4214f090 osd.1 since back 2017-02-08 09:31:03.778901 front 2017-02-08 09:31:03.778901 (cutoff 2017-02-08 09:31:04.012854) Feb 8 09:31:24 empire-ceph02 ceph-osd: 2017-02-08 09:31:24.012900 7f3dcfe94700 -1 osd.8 344 heartbeat_check: no reply from 0x563e4214da10 osd.3 since back 2017-02-08 09:31:03.778901 front 2017-02-08 09:31:03.778901 (cutoff 2017-02-08 09:31:04.012854) Feb 8 09:31:24 empire-ceph02 ceph-osd: 2017-02-08 09:31:24.012915 7f3dcfe94700 -1 osd.8 344 heartbeat_check: no reply from 0x563e4214d410 osd.5 since back 2017-02-08 09:31:03.778901 front 2017-02-08 09:31:03.778901 (cutoff 2017-02-08 09:31:04.012854) Feb 8 09:31:24 empire-ceph02 ceph-osd: 2017-02-08 09:31:24.012927 7f3dcfe94700 -1 osd.8 344 heartbeat_check: no reply from 0x563e4214e490 osd.6 since back 2017-02-08 09:31:03.778901 front 2017-02-08 09:31:03.778901 (cutoff 2017-02-08 09:31:04.012854) Feb 8 09:31:24 empire-ceph02 ceph-osd: 2017-02-08 09:31:24.012934 7f3dcfe94700 -1 osd.8 344 heartbeat_check: no reply from 0x563e42149a10 osd.7 since back 2017-02-08 09:31:03.778901 front 2017-02-08 09:31:03.778901 (cutoff 2017-02-08 09:31:04.012854) Feb 8 09:31:25 empire-ceph02 ceph-osd: 2017-02-08 09:31:25.013038 7f3dcfe94700 -1 osd.8 345 heartbeat_check: no reply from 0x563e4214f090 osd.1 since back 2017-02-08 09:31:03.778901 front 2017-02-08 09:31:03.778901 (cutoff 2017-02-08 09:31:05.013020) Is this a setting issue? Or Maybe a bug? When I look at the other ceph-mon processes on other nodes, they aren’t using any swap, and only about 500MB of memory. When I restart ceph-mds on the server that shows the issue, the swap frees up, and the memory for the new ceph-mon is 500MB again. Any ideas would be appreciated. Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com