Joao, Here is the information requested. Thanks for taking a look. Note that the below is after I restarted the ceph-mon processes yesterday. If this is not acceptable, I will have to wait until the issue reappears. This is on a small cluster. 4 ceph nodes, and 6 ceph kernel clients running over infiniband. [root@empire-ceph02 log]# ceph -s cluster 62ed97d6-adf4-12e4-8fd5-3d9701b22b87 health HEALTH_OK monmap e3: 3 mons at {empire-ceph01=192.168.20.241:6789/0,empire-ceph02=192.168.20.242:6789/0,empire-ceph03=192.168.20.243:6789/0} election epoch 56, quorum 0,1,2 empire-ceph01,empire-ceph02,empire-ceph03 fsmap e526: 1/1/1 up {0=empire-ceph03=up:active}, 1 up:standby osdmap e361: 32 osds: 32 up, 32 in flags sortbitwise,require_jewel_osds pgmap v2427955: 768 pgs, 2 pools, 2370 GB data, 1759 kobjects 7133 GB used, 109 TB / 116 TB avail 768 active+clean client io 256 B/s wr, 0 op/s rd, 0 op/s wr [root@empire-ceph02 log]# ceph daemon mon.empire-ceph02 ops { "ops": [], "num_ops": 0 } [root@empire-ceph02 mon]# du -sh ceph-empire-ceph02 30M ceph-empire-ceph02 [root@empire-ceph02 mon]# ls -lR .: total 0 drwxr-xr-x. 3 ceph ceph 46 Dec 6 14:26 ceph-empire-ceph02 ./ceph-empire-ceph02: total 8 -rw-r--r--. 1 ceph ceph 0 Dec 6 14:26 done -rw-------. 1 ceph ceph 77 Dec 6 14:26 keyring drwxr-xr-x. 2 ceph ceph 4096 Feb 9 06:58 store.db ./ceph-empire-ceph02/store.db: total 30056 -rw-r--r--. 1 ceph ceph 396167 Feb 9 06:06 510929.sst -rw-r--r--. 1 ceph ceph 778898 Feb 9 06:56 511298.sst -rw-r--r--. 1 ceph ceph 5177344 Feb 9 07:01 511301.log -rw-r--r--. 1 ceph ceph 1491740 Feb 9 06:58 511305.sst -rw-r--r--. 1 ceph ceph 2162405 Feb 9 06:58 511306.sst -rw-r--r--. 1 ceph ceph 2162047 Feb 9 06:58 511307.sst -rw-r--r--. 1 ceph ceph 2104201 Feb 9 06:58 511308.sst -rw-r--r--. 1 ceph ceph 2146113 Feb 9 06:58 511309.sst -rw-r--r--. 1 ceph ceph 2123659 Feb 9 06:58 511310.sst -rw-r--r--. 1 ceph ceph 2162927 Feb 9 06:58 511311.sst -rw-r--r--. 1 ceph ceph 2129640 Feb 9 06:58 511312.sst -rw-r--r--. 1 ceph ceph 2133590 Feb 9 06:58 511313.sst -rw-r--r--. 1 ceph ceph 2143906 Feb 9 06:58 511314.sst -rw-r--r--. 1 ceph ceph 2158434 Feb 9 06:58 511315.sst -rw-r--r--. 1 ceph ceph 1649589 Feb 9 06:58 511316.sst -rw-r--r--. 1 ceph ceph 16 Feb 8 13:42 CURRENT -rw-r--r--. 1 ceph ceph 0 Dec 6 14:26 LOCK -rw-r--r--. 1 ceph ceph 983040 Feb 9 06:58 MANIFEST-503363 Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10 From: Joao Eduardo Luis<mailto:joao@xxxxxxx> Sent: Thursday, February 9, 2017 3:06 AM To: ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx> Subject: Re: ceph-mon memory issue jewel 10.2.5 kernel 4.4 Hi Jim, On 02/08/2017 07:45 PM, Jim Kilborn wrote: > I have had two ceph monitor nodes generate swap space alerts this week. > Looking at the memory, I see ceph-mon using a lot of memory and most of the swap space. My ceph nodes have 128GB mem, with 2GB swap (I know the memory/swap ratio is odd) > > When I get the alert, I see the following [snip] > root@empire-ceph02 ~]# ps -aux | egrep 'ceph-mon|MEM' > > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > > ceph 174239 0.3 45.8 62812848 60405112 ? Ssl 2016 269:08 /usr/bin/ceph-mon -f --cluster ceph --id empire-ceph02 --setuser ceph --setgroup ceph > > [snip] > > > Is this a setting issue? Or Maybe a bug? > When I look at the other ceph-mon processes on other nodes, they aren’t using any swap, and only about 500MB of memory. Can you get us the result of `ceph -s`, of `ceph daemon mon.ID ops`, and the size of your monitor's data directory? The latter, ideally, recursive with the sizes of all the children in the tree (which, assuming they're a lot, would likely be better on a pastebin). -Joao _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com