On 27/07/16 10:59, Christian Balzer wrote:
Hello,
On Wed, 27 Jul 2016 10:21:34 +0200 Kenneth Waegeman wrote:
Hi all,
When our OSD hosts are running for some time, we start see increased
usage of swap on a number of them. Some OSDs don't use swap for weeks,
while others has a full (4G) swap, and start filling swap again after we
did a swapoff/swapon.
Obvious first question would be, are all these hosts really the same, HW,
SW and configuration wise?
They have the same hardware, are configured the same through config mgt
with ceph 10.2.2 and kernel 3.10.0-327.18.2.el7.ug.x86_64
We have 8 8TB OSDS and 2 cache SSDs on each hosts, and 80GB of Memory.
How full are these OSDs?
I'm interested in # of files, not space, so a "df -i" should give us some idea.
Filesystem Inodes IUsed IFree IUse%
Mounted on
/dev/sdm7 19832320 50068 19782252 1%
/var/lib/ceph/osd/cache/sdm
/dev/md124 194557760 19620569 174937191 11%
/var/lib/ceph/osd/sdk0sdl
/dev/md117 194557760 20377826 174179934 11%
/var/lib/ceph/osd/sdc0sdd
/dev/md127 194557760 21453957 173103803 12%
/var/lib/ceph/osd/sda0sdb
/dev/md121 194557760 20270844 174286916 11%
/var/lib/ceph/osd/sdq0sdr
/dev/md118 194557760 20476860 174080900 11%
/var/lib/ceph/osd/sde0sdf
/dev/md120 194557760 19939165 174618595 11%
/var/lib/ceph/osd/sdo0sdp
/dev/md113 194557760 22098382 172459378 12%
/var/lib/ceph/osd/sdg0sdh
/dev/md112 194557760 18209988 176347772 10%
/var/lib/ceph/osd/sdi0sdj
/dev/sdn7 19930624 47087 19883537 1%
/var/lib/ceph/osd/cache/sdn
80GB is an odd number, how are the DIMMs distributed among the CPU(s)?
Only 1 socket:
Machine (79GB)
Socket L#0 + L3 L#0 (20MB)
L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
PU L#0 (P#0)
PU L#1 (P#8)
L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
PU L#2 (P#1)
PU L#3 (P#9)
L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
PU L#4 (P#2)
PU L#5 (P#10)
L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3
PU L#6 (P#3)
PU L#7 (P#11)
L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4
PU L#8 (P#4)
PU L#9 (P#12)
L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5
PU L#10 (P#5)
PU L#11 (P#13)
L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6
PU L#12 (P#6)
PU L#13 (P#14)
L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7
PU L#14 (P#7)
PU L#15 (P#15)
3 dimms of 16GB + 1 dimm of 8 in first set of DIMMS, 3 dimms of 8 in
second set (as in our vendor's manual)
There is still about 15-20GB memory available when this happens. Running
Centos7;
How do you define free memory?
Not used at all?
I'd expect any Ceph storage server to use all "free" RAM for SLAB and
pagecache very quickly, at the latest after the first deep scrub.
%Cpu(s): 5.3 us, 0.1 sy, 0.0 ni, 94.1 id, 0.5 wa, 0.0 hi, 0.0 si,
0.0 st
KiB Mem : 82375104 total, 7037032 free, 41117768 used, 34220308 buff/cache
KiB Swap: 4194300 total, 3666416 free, 527884 used. 15115612 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3979408 ceph 20 0 4115960 1.079g 5912 S 85.1 1.4 7174:16
ceph-osd
3979417 ceph 20 0 3843488 967424 6076 S 1.7 1.2 7114:34
ceph-osd
3979410 ceph 20 0 4089372 1.085g 5964 S 1.3 1.4 9072:56
ceph-osd
3979419 ceph 20 0 4345000 1.116g 6168 S 1.3 1.4 9151:36
ceph-osd
If it is really unused AND your system is swapping, something odd is going
on indeed, maybe something NUMA related that prevents part of your memory
from being used.
Of course this could also be an issue with your CentOS kernel, I'm
definitely not seeing anything like this on any of my machines.
We had swapiness set to 0.
I wouldn't set it lower than 1.
Also any other tuning settings, like vm/vfs_cache_pressure and
vm/min_free_kbytes?
vfs_cache_pressure is on the default 100,
vm.min_free_kbytes=3145728
other tuned settings:
fs.file-max=262144
kernel.msgmax=65536
kernel.msgmnb=65536
kernel.msgmni=1024
kernel.pid_max=4194303
kernel.sem=250 32000 100 1024
kernel.shmall=20971520
kernel.shmmax=34359738368
kernel.shmmni=16384
net.core.netdev_max_backlog=250000
net.core.rmem_default=262144
net.core.rmem_max=4194304
net.core.somaxconn=1024
net.core.wmem_default=262144
net.core.wmem_max=4194304
net.ipv4.conf.all.arp_filter=1
net.ipv4.ip_local_port_range=32768 61000
net.ipv4.neigh.default.base_reachable_time=14400
net.ipv4.neigh.default.gc_interval=14400
net.ipv4.neigh.default.gc_stale_time=14400
net.ipv4.neigh.default.gc_thresh1=2048
net.ipv4.neigh.default.gc_thresh2=3072
net.ipv4.neigh.default.gc_thresh3=4096
net.ipv4.tcp_fin_timeout=30
net.ipv4.tcp_keepalive_time=1800
net.ipv4.tcp_max_syn_backlog=30000
net.ipv4.tcp_max_tw_buckets=2000000
net.ipv4.tcp_slow_start_after_idle=0
net.ipv4.tcp_syncookies=0
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_tw_reuse=1
net.netfilter.nf_conntrack_generic_timeout=120
net.netfilter.nf_conntrack_tcp_timeout_established=86400
vm.zone_reclaim_mode=0
Thanks again!
K
There is no client io right now,
only scrubbing. some OSDs are using 20-80% of cpu.
Sounds high for pure CPU usage, unless that includes IOWAIT.
Christian
Has somebody seen this behaviour? It doesn't have to be bad, but what
could explain some hosts keep on swapping, and others don't?
Could this be some issue?
Thanks !!
Kenneth
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com