Re: OSD host swap usage

Kenneth Waegeman <kenneth.waegeman@xxxxxxxx> · Wed, 27 Jul 2016 15:18:49 +0200

On 27/07/16 10:59, Christian Balzer wrote:
Hello,

On Wed, 27 Jul 2016 10:21:34 +0200 Kenneth Waegeman wrote:

Hi all,

When our OSD hosts are running for some time, we start see increased
usage of swap on a number of them. Some OSDs don't use swap for weeks,
while others has a full (4G) swap, and start filling swap again after we
did a swapoff/swapon.
Obvious first question would be, are all these hosts really the same, HW,
SW and configuration wise?
They have the same hardware, are configured the same through config mgt 
with ceph 10.2.2 and kernel 3.10.0-327.18.2.el7.ug.x86_64

We have 8 8TB OSDS and 2 cache SSDs on each hosts, and 80GB of Memory.
How full are these OSDs?
I'm interested in # of files, not space, so a "df -i" should give us some idea.

Filesystem                               Inodes    IUsed     IFree IUse% 
Mounted on
/dev/sdm7                              19832320    50068 19782252    1% 
/var/lib/ceph/osd/cache/sdm
/dev/md124                            194557760 19620569 174937191 11% 
/var/lib/ceph/osd/sdk0sdl
/dev/md117                            194557760 20377826 174179934 11% 
/var/lib/ceph/osd/sdc0sdd
/dev/md127                            194557760 21453957 173103803 12% 
/var/lib/ceph/osd/sda0sdb
/dev/md121                            194557760 20270844 174286916 11% 
/var/lib/ceph/osd/sdq0sdr
/dev/md118                            194557760 20476860 174080900 11% 
/var/lib/ceph/osd/sde0sdf
/dev/md120                            194557760 19939165 174618595 11% 
/var/lib/ceph/osd/sdo0sdp
/dev/md113                            194557760 22098382 172459378 12% 
/var/lib/ceph/osd/sdg0sdh
/dev/md112                            194557760 18209988 176347772 10% 
/var/lib/ceph/osd/sdi0sdj
/dev/sdn7                              19930624    47087 19883537    1% 
/var/lib/ceph/osd/cache/sdn

80GB is an odd number, how are the DIMMs distributed among the CPU(s)?
Only 1 socket:

Machine (79GB)
  Socket L#0 + L3 L#0 (20MB)
    L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
      PU L#0 (P#0)
      PU L#1 (P#8)
    L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
      PU L#2 (P#1)
      PU L#3 (P#9)
    L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
      PU L#4 (P#2)
      PU L#5 (P#10)
    L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3
      PU L#6 (P#3)
      PU L#7 (P#11)
    L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4
      PU L#8 (P#4)
      PU L#9 (P#12)
    L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5
      PU L#10 (P#5)
      PU L#11 (P#13)
    L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6
      PU L#12 (P#6)
      PU L#13 (P#14)
    L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7
      PU L#14 (P#7)
      PU L#15 (P#15)

3 dimms of 16GB + 1 dimm of 8 in first set of DIMMS, 3 dimms of 8 in 
second set (as in our vendor's manual)

There is still about 15-20GB memory available when this happens. Running
Centos7;
How do you define free memory?
Not used at all?
I'd expect any Ceph storage server to use all "free" RAM for SLAB and
pagecache very quickly, at the latest after the first deep scrub.
%Cpu(s):  5.3 us,  0.1 sy,  0.0 ni, 94.1 id,  0.5 wa,  0.0 hi,  0.0 si,  
0.0 st
KiB Mem : 82375104 total,  7037032 free, 41117768 used, 34220308 buff/cache
KiB Swap:  4194300 total,  3666416 free,   527884 used. 15115612 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM TIME+ COMMAND
3979408 ceph      20   0 4115960 1.079g   5912 S  85.1  1.4 7174:16 
ceph-osd
3979417 ceph      20   0 3843488 967424   6076 S   1.7  1.2 7114:34 
ceph-osd
3979410 ceph      20   0 4089372 1.085g   5964 S   1.3  1.4 9072:56 
ceph-osd
3979419 ceph      20   0 4345000 1.116g   6168 S   1.3  1.4 9151:36 
ceph-osd

If it is really unused AND your system is swapping, something odd is going
on indeed, maybe something NUMA related that prevents part of your memory
from being used.

Of course this could also be an issue with your CentOS kernel, I'm
definitely not seeing anything like this on any of my machines.

We had swapiness set to 0.
I wouldn't set it lower than 1.
Also any other tuning settings, like vm/vfs_cache_pressure and
vm/min_free_kbytes?

vfs_cache_pressure is on the default 100,
vm.min_free_kbytes=3145728

other tuned settings:

fs.file-max=262144
kernel.msgmax=65536
kernel.msgmnb=65536
kernel.msgmni=1024
kernel.pid_max=4194303
kernel.sem=250 32000 100 1024
kernel.shmall=20971520
kernel.shmmax=34359738368
kernel.shmmni=16384
net.core.netdev_max_backlog=250000
net.core.rmem_default=262144
net.core.rmem_max=4194304
net.core.somaxconn=1024
net.core.wmem_default=262144
net.core.wmem_max=4194304
net.ipv4.conf.all.arp_filter=1
net.ipv4.ip_local_port_range=32768 61000
net.ipv4.neigh.default.base_reachable_time=14400
net.ipv4.neigh.default.gc_interval=14400
net.ipv4.neigh.default.gc_stale_time=14400
net.ipv4.neigh.default.gc_thresh1=2048
net.ipv4.neigh.default.gc_thresh2=3072
net.ipv4.neigh.default.gc_thresh3=4096
net.ipv4.tcp_fin_timeout=30
net.ipv4.tcp_keepalive_time=1800
net.ipv4.tcp_max_syn_backlog=30000
net.ipv4.tcp_max_tw_buckets=2000000
net.ipv4.tcp_slow_start_after_idle=0
net.ipv4.tcp_syncookies=0
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_tw_reuse=1
net.netfilter.nf_conntrack_generic_timeout=120
net.netfilter.nf_conntrack_tcp_timeout_established=86400
vm.zone_reclaim_mode=0

Thanks again!

K

There is no client io right now,
only scrubbing. some OSDs are using 20-80% of cpu.

Sounds high for pure CPU usage, unless that includes IOWAIT.

Christian

Has somebody seen this behaviour? It doesn't have to be bad, but what
could explain some hosts keep on swapping, and others don't?
Could this be some issue?

Thanks !!

Kenneth
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com