Re: ceph osd memory free problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 12, 2017 at 10:45:52AM +0800, 于相洋 wrote:
> Hi cephers,
> 
> I have met a memory problem in ceph rados server nodes.
> 
> Total memory size is 64GB and used 56GB, only 8GB is left, cached and
> buffers takes few memory,   my swap space is used up, as shown below:
> If free memory is too low, there may occur OOM problem, since swap is
> used up, there may be some performance problems.
It's going to XFS.

You didn't post the OOM, but this sounds very much like the XFS memory
fragmentation issue as seen here:
https://serverfault.com/questions/642883/cause-of-page-fragmentation-on-large-server-with-xfs-20-disks-and-ceph

I regularly see it on our systems w/ 36x 6T OSD and 256GB of RAM as seen below,
a dmesg capture from a few days ago. All OSDs are 40-60% full.

The best mitigation so far is 'echo 2 > /proc/sys/vm/drop_caches' run nightly
during off-peak. The other suggestions in the above link reduced the frequency
of the problem for us, but didn't make it go away.

Timestamp for all of it: [Thu Jun  8 01:41:59 2017]
=====
tp_osd_tp invoked oom-killer: gfp_mask=0x240c2c0, order=3, oom_score_adj=0
tp_osd_tp cpuset=/ mems_allowed=0-1
CPU: 15 PID: 1085880 Comm: tp_osd_tp Tainted: G        W       4.4.0-59-generic #80~14.04.1-Ubuntu
Hardware name: Supermicro SSG-6048R-E1CR36L/X10DRH-iT, BIOS 2.0a 06/30/2016
 0000000000000000 ffff882a471f3a30 ffffffff813dbd6c ffff882a471f3be8
 0000000000000000 ffff882a471f3ac0 ffffffff811fafc6 ffff882a471f3be8
 ffff882a471f3af8 ffff8832ad0ac600 0000000000000000 0000000000000000
Call Trace:
 [<ffffffff813dbd6c>] dump_stack+0x63/0x87
 [<ffffffff811fafc6>] dump_header+0x5b/0x1d5
 [<ffffffff81188b35>] oom_kill_process+0x205/0x3d0
 [<ffffffff8118916b>] out_of_memory+0x40b/0x460
 [<ffffffff811fba7f>] __alloc_pages_slowpath.constprop.87+0x742/0x7ad
 [<ffffffff8118e167>] __alloc_pages_nodemask+0x237/0x240
 [<ffffffffc03df681>] ? xfs_da_state_free+0x21/0x30 [xfs]
 [<ffffffff811d3e18>] alloc_pages_current+0x88/0x120
 [<ffffffff8118ccc9>] alloc_kmem_pages+0x19/0x90
 [<ffffffff811a7868>] kmalloc_order+0x18/0x50
 [<ffffffff811a78c6>] kmalloc_order_trace+0x26/0xb0
 [<ffffffff811df331>] __kmalloc+0x251/0x270
 [<ffffffff812253de>] getxattr+0x8e/0x1b0
 [<ffffffffc04380f5>] ? posix_acl_access_exists+0x15/0x20 [xfs]
 [<ffffffffc041e602>] ? xfs_vn_listxattr+0xf2/0x160 [xfs]
 [<ffffffff811b5580>] ? handle_mm_fault+0x250/0x540
 [<ffffffff81225dee>] SyS_fgetxattr+0x5e/0xb0
 [<ffffffff81802c76>] entry_SYSCALL_64_fastpath+0x16/0x75
Mem-Info:
active_anon:8807118 inactive_anon:870763 isolated_anon:0
 active_file:5614956 inactive_file:4123432 isolated_file:0
 unevictable:8 dirty:4323 writeback:0 unstable:0
 slab_reclaimable:1921141 slab_unreclaimable:4002171
 mapped:6716850 shmem:6631 pagetables:82513 bounce:0
 free:758377 free_pcp:2615 free_cma:0
Node 0 DMA free:15320kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15960kB managed:15832kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 1842 128815 128815 128815
Node 0 DMA32 free:511832kB min:3744kB low:4680kB high:5616kB active_anon:8kB inactive_anon:8kB active_file:8kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1967272kB managed:1886840kB mlocked:0kB dirty:0kB writeback:0kB mapped:8kB shmem:0kB slab_reclaimable:282060kB slab_unreclaimable:461284kB kernel_stack:13264kB pagetables:1848kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 126972 126972 126972
Node 0 Normal free:915268kB min:258172kB low:322712kB high:387256kB active_anon:19050184kB inactive_anon:1735572kB active_file:12163768kB inactive_file:8400128kB unevictable:32kB isolated(anon):0kB isolated(file):0kB present:132120576kB managed:130020328kB mlocked:32kB dirty:16060kB writeback:0kB mapped:13404324kB shmem:12012kB slab_reclaimable:4971164kB slab_unreclaimable:8497080kB kernel_stack:467504kB pagetables:170296kB unstable:0kB bounce:0kB free_pcp:5476kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:16 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0 0
Node 1 Normal free:1591088kB min:262336kB low:327920kB high:393504kB active_anon:16178280kB inactive_anon:1747472kB active_file:10296048kB inactive_file:8093600kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:134217728kB managed:132116736kB mlocked:0kB dirty:1232kB writeback:0kB mapped:13463068kB shmem:14512kB slab_reclaimable:2431340kB slab_unreclaimable:7050320kB kernel_stack:563280kB pagetables:157908kB unstable:0kB bounce:0kB free_pcp:4984kB local_pcp:8kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0 0
Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 0*32kB 1*64kB (U) 1*128kB (U) 1*256kB (U) 1*512kB (U) 0*1024kB 1*2048kB (M) 3*4096kB (M) = 15320kB
Node 0 DMA32: 391*4kB (UME) 278*8kB (UM) 1027*16kB (UME) 683*32kB (UMEH) 504*64kB (UMEH) 396*128kB (UMH) 387*256kB (MEH) 178*512kB (MEH) 44*1024kB (MH) 74*2048kB (MH) 0*4096kB = 511836kB
Node 0 Normal: 52559*4kB (UME) 88630*8kB (UME) 1*16kB (H) 0*32kB 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 919356kB
Node 1 Normal: 127175*4kB (UME) 87936*8kB (UME) 23906*16kB (UMEH) 11*32kB (H) 6*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1595420kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
9783120 total pagecache pages
38289 pages in swap cache
Swap cache stats: add 19895227, delete 19856938, find 11143389/14461125
Free swap  = 7758284kB
Total swap = 8388604kB
67080384 pages RAM
0 pages HighMem/MovableOnly
1070450 pages reserved
0 pages cma reserved
0 pages hwpoisoned
[ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[ 1007]     0  1007     5068      483      13       3      194             0 upstart-udev-br
[ 1013]     0  1013    12887      578      27       3      102         -1000 systemd-udevd
[ 1049]     0  1049     3820      304      13       3       24             0 upstart-file-br
[ 1052]   102  1052    80730    14806      62       4     3742             0 rsyslogd
[ 1798]     0  1798    12164      728      27       3      101             0 lldpd
[ 1869]   105  1869    12164      419      25       3       98             0 lldpd
[ 1879]     0  1879     3816      242      12       3       35             0 upstart-socket-
[ 2750]   103  2750     7866      797      20       3      102             0 ntpd
[ 2999]     0  2999     3635      431      12       3       38             0 getty
[ 3000]     0  3000     3635      448      12       3       37             0 getty
[ 3003]     0  3003     3635      436      12       3       39             0 getty
[ 3004]     0  3004     3635      435      12       3       40             0 getty
[ 3006]     0  3006     3635      450      12       3       39             0 getty
[ 3022]     0  3022    15346      927      34       3      140         -1000 sshd
[ 3024]     0  3024     5914      543      17       3       40             0 cron
[ 3224]     0  3224     1083      209       8       3       22             0 collectdmon
[ 3225]     0  3225   195966     1083      47       4       42             0 collectd
[ 3253]     0  3253    46081     1709      24       3     1019             0 fail2ban-server
[ 3408]     0  3408     6336      667      16       3       49             0 master
[ 3419]   104  3419     6893      698      18       3       44             0 qmgr
[ 3476]     0  3476     3318      282      10       3       24             0 mdadm
[ 3509]     0  3509     3635      439      12       3       37             0 getty
[ 3510]     0  3510     3197      439      12       3       35             0 getty
[ 3511]     0  3511     3197      445      12       3       34             0 getty
[2021121]   106 2021121     5835      474      16       3      123             0 nrpe
[1061740]     0 1061740  1193840   428081    2126       8     1069             0 ceph-osd
[1062045]     0 1062045  1580279   528454    3160      10     1199             0 ceph-osd
[1062547]     0 1062547  1051761   370552    1826       7     1870             0 ceph-osd
[1062915]     0 1062915  1174510   411056    2062       8     1590             0 ceph-osd
[1063396]     0 1063396  1400646   581974    2551       8      905             0 ceph-osd
[1064669]     0 1064669  1231068   386831    2184       7      767             0 ceph-osd
[1064973]     0 1064973  1358184   428018    2480       8      831             0 ceph-osd
[1065390]     0 1065390  1205864   439471    2121       9     1399             0 ceph-osd
[1065609]     0 1065609  1302914   479849    2331       8      698             0 ceph-osd
[1065968]     0 1065968  1376198   481664    2471       8      543             0 ceph-osd
[1066275]     0 1066275  1225083   439472    2185       8      810             0 ceph-osd
[1066575]     0 1066575  1285168   446490    2272       8      721             0 ceph-osd
[1066876]     0 1066876  1275062   448917    2278       8     5928             0 ceph-osd
[1067225]     0 1067225  1142918   402708    1991       7      966             0 ceph-osd
[1067581]     0 1067581  1084617   390226    1900       8     1192             0 ceph-osd
[1067867]     0 1067867  1306584   465829    2324       8     1140             0 ceph-osd
[1068359]     0 1068359  1143859   419061    2038       8      486             0 ceph-osd
[1068712]     0 1068712  1356145   482163    2482       8      703             0 ceph-osd
[1068945]     0 1068945  1464922   511993    2684      10     1054             0 ceph-osd
[1069202]     0 1069202  1314611   466149    2343       8      373             0 ceph-osd
[1077729]     0 1077729  1236855   474960    2196       8     2141             0 ceph-osd
[1077994]     0 1077994  1343678   511317    2422       8     3687             0 ceph-osd
[1078712]     0 1078712  1305742   547914    2328       8    14576             0 ceph-osd
[1079898]     0 1079898  1095581   443459    1913       7     1961             0 ceph-osd
[1081804]     0 1081804  1032092   369817    1789       7     6281             0 ceph-osd
[1082066]     0 1082066  1561346   536779    2734      10     7147             0 ceph-osd
[1083961]     0 1083961  1134121   445427    1976       7    20826             0 ceph-osd
[1086089]     0 1086089  1273552   473015    2271       8     4362             0 ceph-osd
[1088670]     0 1088670  1114051   402725    1973       7     8050             0 ceph-osd
[1092038]     0 1092038  1125645   435613    1976       7     9110             0 ceph-osd
[1096756]     0 1096756  1298374   431037    2313       8     3579             0 ceph-osd
[1097216]     0 1097216  1287326   460129    2289       8     8807             0 ceph-osd
[1101156]     0 1101156  1175688   429388    2065       8     7705             0 ceph-osd
[1107340]     0 1107340  1428037   468276    2626      10     3232             0 ceph-osd
[1107953]     0 1107953  1256050   459764    2239       8     2232             0 ceph-osd
[2432806]     0 2432806  1533549   440887    2734      10     2175             0 ceph-osd
[507551]     0 507551    28175     9661      60       3      108             0 ruby
[3159966]   999 3159966    91561    54449     141       3      978          1000 netdata
[3159992]   999 3159992    25706     4617      40       3        0          1000 python
[3615506]   999 3615506    18141     3880      29       3        0          1000 apps.plugin
[3644773]   104 3644773     6852      701      18       3        0             0 pickup
[3703623]   999 3703623     4572      820      14       3        0          1000 bash
[3709023]   104 3709023     6852      708      17       3        0             0 showq
Out of memory: Kill process 3159966 (netdata) score 1000 or sacrifice child
Killed process 3159992 (python) total-vm:102824kB, anon-rss:11528kB, file-rss:6940kB
=====



> 
> [root@localhost ~]# free -m
>                                    total          used           free
>    shared    buffers     cached
> Mem:                          64417       56768       7648          0
>       114        443
> -/+ buffers/cache:                       56211       8206
> Swap:                          8191         8191          0
> 
> >From the /proc/meminfo,  slab reclaim takes only 2GB memory,
> 
> [root@wzdx48 ~]# cat /proc/meminfo
> MemTotal:       65963088 kB
> MemFree:         7750100 kB
> Buffers:          116776 kB
> Cached:           453988 kB
> SwapCached:       813692 kB
> Active:         12835884 kB
> Inactive:        2184952 kB
> Active(anon):   12480640 kB
> Inactive(anon):  1971280 kB
> Active(file):     355244 kB
> Inactive(file):   213672 kB
> Unevictable:           0 kB
> Mlocked:               0 kB
> SwapTotal:       8388604 kB
> SwapFree:            128 kB
> Dirty:               928 kB
> Writeback:             0 kB
> AnonPages:      13636556 kB
> Mapped:            38184 kB
> Shmem:              1840 kB
> Slab:            6074272 kB
> SReclaimable:    2310640 kB
> SUnreclaim:      3763632 kB
> KernelStack:       42936 kB
> PageTables:        71748 kB
> NFS_Unstable:          0 kB
> Bounce:                0 kB
> WritebackTmp:          0 kB
> CommitLimit:    41370148 kB
> Committed_AS:   39673248 kB
> VmallocTotal:   34359738367 kB
> VmallocUsed:      390436 kB
> VmallocChunk:   34324779316 kB
> HardwareCorrupted:     0 kB
> AnonHugePages:   4503552 kB
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
> DirectMap4k:        5504 kB
> DirectMap2M:     2082816 kB
> DirectMap1G:    65011712 kB
> 
> But when I run echo 3 > /proc/sys/vm/drop_caches, I can get 40GB free
> memory back.
> 
> [root@wzdx48 ~]# echo 3 > /proc/sys/vm/drop_caches
> [root@wzdx48 ~]# free -m
>              total       used       free     shared    buffers     cached
> Mem:         64417      15566      48850          0         10         59
> -/+ buffers/cache:      15496      48920
> Swap:         8191       8191          0
> 
> I just can't understand where are the 40GB memory used???
> 
> 
> OSD node  background:
> 
> [root@localhost ~]# ceph -s
>      health HEALTH_WARN
>             too many PGs per OSD (438 > max 300)
>             noout,nodeep-scrub flag(s) set
>      monmap e3: 3 mons at
> {60=192.168.2.60:6789/0,61=192.168.2.61:6789/0,62=192.168.2.62:6789/0}
>             election epoch 2720, quorum 0,1,2 60,61,62
>      osdmap e37148: 695 osds: 671 up, 671 in
>             nodeep-scrub
>       pgmap v12910815: 98064 pgs, 21 pools, 612 TB data, 757 Mobjects
>             1862 TB used, 2357 TB / 4220 TB avail
>                98015 active+clean
>                   49 active+clean+scrubbing
>   client io 9114 kB/s rd, 94051 kB/s wr, 6553 op/s
> 
> [root@wzdx48 ~]# df -i
> Filesystem        Inodes   IUsed     IFree IUse% Mounted on
> /dev/sda3       60489728  112915  60376813    1% /
> tmpfs            8245386      36   8245350    1% /dev/shm
> /dev/sda1         128016      43    127973    1% /boot
11
> 
> [root@wzdx48 ~]# df -h
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/sda3       909G   30G  871G   4% /
> tmpfs            32G  1.1M   32G   1% /dev/shm
> /dev/sda1       477M   57M  396M  13% /boot
> /dev/sdb2       425G  118G  307G  28% /data/osd/osd.660
> /dev/sdc2       425G  130G  296G  31% /data/osd/osd.661
> /dev/sdd2       425G  128G  298G  30% /data/osd/osd.662
> /dev/sde2       425G  125G  301G  30% /data/osd/osd.663
> /dev/sdf2       425G  134G  292G  32% /data/osd/osd.664
> /dev/sdg2       425G  131G  294G  31% /data/osd/osd.665
> /dev/sdh2       425G  131G  295G  31% /data/osd/osd.666
> /dev/sdi2       425G  124G  302G  30% /data/osd/osd.667
> /dev/sdj2       425G  126G  299G  30% /data/osd/osd.668
> /dev/sdk2       425G  123G  302G  29% /data/osd/osd.669
> /dev/sdl2       131G  351M  130G   1% /data/osd/osd.690
> 
> There is no active client writing or reading files.
> 
> top - 10:28:13 up 272 days, 17:43,  1 user,  load average: 0.28, 0.39, 0.44
> Tasks: 664 total,   1 running, 648 sleeping,   7 stopped,   8 zombie
> Cpu(s):  0.4%us,  0.7%sy,  0.0%ni, 98.3%id,  0.6%wa,  0.0%hi,  0.0%si,  0.0%st
> Mem:  65963088k total, 58901700k used,  7061388k free,   117148k buffers
> Swap:  8388604k total,  8387936k used,      668k free,   457360k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 10166 root      20   0 3726m 1.2g 6096 S  1.7  2.0  13340:55 ceph-osd
> 10251 root      20   0 3589m 1.2g 6064 S  1.7  2.0  12949:25 ceph-osd
> 65247 root      20   0 1955m  16m 3424 S  1.7  0.0 164:03.68 ama
> 10115 root      20   0 3671m 1.2g 6088 S  1.3  2.0  13342:35 ceph-osd
> 10234 root      20   0 3637m 1.2g 6088 S  1.3  1.9  12848:57 ceph-osd
> 10200 root      20   0 3707m 1.2g 6092 S  1.0  2.0  13687:07 ceph-osd
> 10217 root      20   0 3624m 1.2g 6088 S  1.0  1.9  12568:55 ceph-osd
> 10107 root      20   0 3556m 1.2g 6088 S  0.7  1.9  12198:33 ceph-osd
> 10132 root      20   0 3643m 1.3g 6088 S  0.7  2.0  12992:18 ceph-osd
> 10149 root      20   0 3599m 1.2g 6076 S  0.7  2.0  12101:59 ceph-osd
> 12317 root      20   0 15436 1704  932 R  0.7  0.0   0:00.05 top
> 
> Appreciate to receive any reply.
> 
> Best Regards,
> Brandy
> 
> -- 
> Software Engineer, ChinaNetCenter Co., ShenZhen, Guangdong Province, China
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
> --
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Trustee & Treasurer
E-Mail   : robbat2@xxxxxxxxxx
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux