Re: ceph osd memory free problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Appreciate to receive you reply, Robin Hugh.
I have also try to adjust the vm configuration before ,but get no effect.

Now I will accept your method to do "echo 2 "> drop_caches
Thks.

2017-06-12 14:12 GMT+08:00 Robin H. Johnson <robbat2@xxxxxxxxxx>:
> On Mon, Jun 12, 2017 at 10:45:52AM +0800, 于相洋 wrote:
>> Hi cephers,
>>
>> I have met a memory problem in ceph rados server nodes.
>>
>> Total memory size is 64GB and used 56GB, only 8GB is left, cached and
>> buffers takes few memory,   my swap space is used up, as shown below:
>> If free memory is too low, there may occur OOM problem, since swap is
>> used up, there may be some performance problems.
> It's going to XFS.
>
> You didn't post the OOM, but this sounds very much like the XFS memory
> fragmentation issue as seen here:
> https://serverfault.com/questions/642883/cause-of-page-fragmentation-on-large-server-with-xfs-20-disks-and-ceph
>
> I regularly see it on our systems w/ 36x 6T OSD and 256GB of RAM as seen below,
> a dmesg capture from a few days ago. All OSDs are 40-60% full.
>
> The best mitigation so far is 'echo 2 > /proc/sys/vm/drop_caches' run nightly
> during off-peak. The other suggestions in the above link reduced the frequency
> of the problem for us, but didn't make it go away.
>
> Timestamp for all of it: [Thu Jun  8 01:41:59 2017]
> =====
> tp_osd_tp invoked oom-killer: gfp_mask=0x240c2c0, order=3, oom_score_adj=0
> tp_osd_tp cpuset=/ mems_allowed=0-1
> CPU: 15 PID: 1085880 Comm: tp_osd_tp Tainted: G        W       4.4.0-59-generic #80~14.04.1-Ubuntu
> Hardware name: Supermicro SSG-6048R-E1CR36L/X10DRH-iT, BIOS 2.0a 06/30/2016
>  0000000000000000 ffff882a471f3a30 ffffffff813dbd6c ffff882a471f3be8
>  0000000000000000 ffff882a471f3ac0 ffffffff811fafc6 ffff882a471f3be8
>  ffff882a471f3af8 ffff8832ad0ac600 0000000000000000 0000000000000000
> Call Trace:
>  [<ffffffff813dbd6c>] dump_stack+0x63/0x87
>  [<ffffffff811fafc6>] dump_header+0x5b/0x1d5
>  [<ffffffff81188b35>] oom_kill_process+0x205/0x3d0
>  [<ffffffff8118916b>] out_of_memory+0x40b/0x460
>  [<ffffffff811fba7f>] __alloc_pages_slowpath.constprop.87+0x742/0x7ad
>  [<ffffffff8118e167>] __alloc_pages_nodemask+0x237/0x240
>  [<ffffffffc03df681>] ? xfs_da_state_free+0x21/0x30 [xfs]
>  [<ffffffff811d3e18>] alloc_pages_current+0x88/0x120
>  [<ffffffff8118ccc9>] alloc_kmem_pages+0x19/0x90
>  [<ffffffff811a7868>] kmalloc_order+0x18/0x50
>  [<ffffffff811a78c6>] kmalloc_order_trace+0x26/0xb0
>  [<ffffffff811df331>] __kmalloc+0x251/0x270
>  [<ffffffff812253de>] getxattr+0x8e/0x1b0
>  [<ffffffffc04380f5>] ? posix_acl_access_exists+0x15/0x20 [xfs]
>  [<ffffffffc041e602>] ? xfs_vn_listxattr+0xf2/0x160 [xfs]
>  [<ffffffff811b5580>] ? handle_mm_fault+0x250/0x540
>  [<ffffffff81225dee>] SyS_fgetxattr+0x5e/0xb0
>  [<ffffffff81802c76>] entry_SYSCALL_64_fastpath+0x16/0x75
> Mem-Info:
> active_anon:8807118 inactive_anon:870763 isolated_anon:0
>  active_file:5614956 inactive_file:4123432 isolated_file:0
>  unevictable:8 dirty:4323 writeback:0 unstable:0
>  slab_reclaimable:1921141 slab_unreclaimable:4002171
>  mapped:6716850 shmem:6631 pagetables:82513 bounce:0
>  free:758377 free_pcp:2615 free_cma:0
> Node 0 DMA free:15320kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15960kB managed:15832kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
> lowmem_reserve[]: 0 1842 128815 128815 128815
> Node 0 DMA32 free:511832kB min:3744kB low:4680kB high:5616kB active_anon:8kB inactive_anon:8kB active_file:8kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1967272kB managed:1886840kB mlocked:0kB dirty:0kB writeback:0kB mapped:8kB shmem:0kB slab_reclaimable:282060kB slab_unreclaimable:461284kB kernel_stack:13264kB pagetables:1848kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 126972 126972 126972
> Node 0 Normal free:915268kB min:258172kB low:322712kB high:387256kB active_anon:19050184kB inactive_anon:1735572kB active_file:12163768kB inactive_file:8400128kB unevictable:32kB isolated(anon):0kB isolated(file):0kB present:132120576kB managed:130020328kB mlocked:32kB dirty:16060kB writeback:0kB mapped:13404324kB shmem:12012kB slab_reclaimable:4971164kB slab_unreclaimable:8497080kB kernel_stack:467504kB pagetables:170296kB unstable:0kB bounce:0kB free_pcp:5476kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:16 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0 0 0
> Node 1 Normal free:1591088kB min:262336kB low:327920kB high:393504kB active_anon:16178280kB inactive_anon:1747472kB active_file:10296048kB inactive_file:8093600kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:134217728kB managed:132116736kB mlocked:0kB dirty:1232kB writeback:0kB mapped:13463068kB shmem:14512kB slab_reclaimable:2431340kB slab_unreclaimable:7050320kB kernel_stack:563280kB pagetables:157908kB unstable:0kB bounce:0kB free_pcp:4984kB local_pcp:8kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0 0 0
> Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 0*32kB 1*64kB (U) 1*128kB (U) 1*256kB (U) 1*512kB (U) 0*1024kB 1*2048kB (M) 3*4096kB (M) = 15320kB
> Node 0 DMA32: 391*4kB (UME) 278*8kB (UM) 1027*16kB (UME) 683*32kB (UMEH) 504*64kB (UMEH) 396*128kB (UMH) 387*256kB (MEH) 178*512kB (MEH) 44*1024kB (MH) 74*2048kB (MH) 0*4096kB = 511836kB
> Node 0 Normal: 52559*4kB (UME) 88630*8kB (UME) 1*16kB (H) 0*32kB 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 919356kB
> Node 1 Normal: 127175*4kB (UME) 87936*8kB (UME) 23906*16kB (UMEH) 11*32kB (H) 6*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1595420kB
> Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
> Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
> Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> 9783120 total pagecache pages
> 38289 pages in swap cache
> Swap cache stats: add 19895227, delete 19856938, find 11143389/14461125
> Free swap  = 7758284kB
> Total swap = 8388604kB
> 67080384 pages RAM
> 0 pages HighMem/MovableOnly
> 1070450 pages reserved
> 0 pages cma reserved
> 0 pages hwpoisoned
> [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
> [ 1007]     0  1007     5068      483      13       3      194             0 upstart-udev-br
> [ 1013]     0  1013    12887      578      27       3      102         -1000 systemd-udevd
> [ 1049]     0  1049     3820      304      13       3       24             0 upstart-file-br
> [ 1052]   102  1052    80730    14806      62       4     3742             0 rsyslogd
> [ 1798]     0  1798    12164      728      27       3      101             0 lldpd
> [ 1869]   105  1869    12164      419      25       3       98             0 lldpd
> [ 1879]     0  1879     3816      242      12       3       35             0 upstart-socket-
> [ 2750]   103  2750     7866      797      20       3      102             0 ntpd
> [ 2999]     0  2999     3635      431      12       3       38             0 getty
> [ 3000]     0  3000     3635      448      12       3       37             0 getty
> [ 3003]     0  3003     3635      436      12       3       39             0 getty
> [ 3004]     0  3004     3635      435      12       3       40             0 getty
> [ 3006]     0  3006     3635      450      12       3       39             0 getty
> [ 3022]     0  3022    15346      927      34       3      140         -1000 sshd
> [ 3024]     0  3024     5914      543      17       3       40             0 cron
> [ 3224]     0  3224     1083      209       8       3       22             0 collectdmon
> [ 3225]     0  3225   195966     1083      47       4       42             0 collectd
> [ 3253]     0  3253    46081     1709      24       3     1019             0 fail2ban-server
> [ 3408]     0  3408     6336      667      16       3       49             0 master
> [ 3419]   104  3419     6893      698      18       3       44             0 qmgr
> [ 3476]     0  3476     3318      282      10       3       24             0 mdadm
> [ 3509]     0  3509     3635      439      12       3       37             0 getty
> [ 3510]     0  3510     3197      439      12       3       35             0 getty
> [ 3511]     0  3511     3197      445      12       3       34             0 getty
> [2021121]   106 2021121     5835      474      16       3      123             0 nrpe
> [1061740]     0 1061740  1193840   428081    2126       8     1069             0 ceph-osd
> [1062045]     0 1062045  1580279   528454    3160      10     1199             0 ceph-osd
> [1062547]     0 1062547  1051761   370552    1826       7     1870             0 ceph-osd
> [1062915]     0 1062915  1174510   411056    2062       8     1590             0 ceph-osd
> [1063396]     0 1063396  1400646   581974    2551       8      905             0 ceph-osd
> [1064669]     0 1064669  1231068   386831    2184       7      767             0 ceph-osd
> [1064973]     0 1064973  1358184   428018    2480       8      831             0 ceph-osd
> [1065390]     0 1065390  1205864   439471    2121       9     1399             0 ceph-osd
> [1065609]     0 1065609  1302914   479849    2331       8      698             0 ceph-osd
> [1065968]     0 1065968  1376198   481664    2471       8      543             0 ceph-osd
> [1066275]     0 1066275  1225083   439472    2185       8      810             0 ceph-osd
> [1066575]     0 1066575  1285168   446490    2272       8      721             0 ceph-osd
> [1066876]     0 1066876  1275062   448917    2278       8     5928             0 ceph-osd
> [1067225]     0 1067225  1142918   402708    1991       7      966             0 ceph-osd
> [1067581]     0 1067581  1084617   390226    1900       8     1192             0 ceph-osd
> [1067867]     0 1067867  1306584   465829    2324       8     1140             0 ceph-osd
> [1068359]     0 1068359  1143859   419061    2038       8      486             0 ceph-osd
> [1068712]     0 1068712  1356145   482163    2482       8      703             0 ceph-osd
> [1068945]     0 1068945  1464922   511993    2684      10     1054             0 ceph-osd
> [1069202]     0 1069202  1314611   466149    2343       8      373             0 ceph-osd
> [1077729]     0 1077729  1236855   474960    2196       8     2141             0 ceph-osd
> [1077994]     0 1077994  1343678   511317    2422       8     3687             0 ceph-osd
> [1078712]     0 1078712  1305742   547914    2328       8    14576             0 ceph-osd
> [1079898]     0 1079898  1095581   443459    1913       7     1961             0 ceph-osd
> [1081804]     0 1081804  1032092   369817    1789       7     6281             0 ceph-osd
> [1082066]     0 1082066  1561346   536779    2734      10     7147             0 ceph-osd
> [1083961]     0 1083961  1134121   445427    1976       7    20826             0 ceph-osd
> [1086089]     0 1086089  1273552   473015    2271       8     4362             0 ceph-osd
> [1088670]     0 1088670  1114051   402725    1973       7     8050             0 ceph-osd
> [1092038]     0 1092038  1125645   435613    1976       7     9110             0 ceph-osd
> [1096756]     0 1096756  1298374   431037    2313       8     3579             0 ceph-osd
> [1097216]     0 1097216  1287326   460129    2289       8     8807             0 ceph-osd
> [1101156]     0 1101156  1175688   429388    2065       8     7705             0 ceph-osd
> [1107340]     0 1107340  1428037   468276    2626      10     3232             0 ceph-osd
> [1107953]     0 1107953  1256050   459764    2239       8     2232             0 ceph-osd
> [2432806]     0 2432806  1533549   440887    2734      10     2175             0 ceph-osd
> [507551]     0 507551    28175     9661      60       3      108             0 ruby
> [3159966]   999 3159966    91561    54449     141       3      978          1000 netdata
> [3159992]   999 3159992    25706     4617      40       3        0          1000 python
> [3615506]   999 3615506    18141     3880      29       3        0          1000 apps.plugin
> [3644773]   104 3644773     6852      701      18       3        0             0 pickup
> [3703623]   999 3703623     4572      820      14       3        0          1000 bash
> [3709023]   104 3709023     6852      708      17       3        0             0 showq
> Out of memory: Kill process 3159966 (netdata) score 1000 or sacrifice child
> Killed process 3159992 (python) total-vm:102824kB, anon-rss:11528kB, file-rss:6940kB
> =====
>
>
>
>>
>> [root@localhost ~]# free -m
>>                                    total          used           free
>>    shared    buffers     cached
>> Mem:                          64417       56768       7648          0
>>       114        443
>> -/+ buffers/cache:                       56211       8206
>> Swap:                          8191         8191          0
>>
>> >From the /proc/meminfo,  slab reclaim takes only 2GB memory,
>>
>> [root@wzdx48 ~]# cat /proc/meminfo
>> MemTotal:       65963088 kB
>> MemFree:         7750100 kB
>> Buffers:          116776 kB
>> Cached:           453988 kB
>> SwapCached:       813692 kB
>> Active:         12835884 kB
>> Inactive:        2184952 kB
>> Active(anon):   12480640 kB
>> Inactive(anon):  1971280 kB
>> Active(file):     355244 kB
>> Inactive(file):   213672 kB
>> Unevictable:           0 kB
>> Mlocked:               0 kB
>> SwapTotal:       8388604 kB
>> SwapFree:            128 kB
>> Dirty:               928 kB
>> Writeback:             0 kB
>> AnonPages:      13636556 kB
>> Mapped:            38184 kB
>> Shmem:              1840 kB
>> Slab:            6074272 kB
>> SReclaimable:    2310640 kB
>> SUnreclaim:      3763632 kB
>> KernelStack:       42936 kB
>> PageTables:        71748 kB
>> NFS_Unstable:          0 kB
>> Bounce:                0 kB
>> WritebackTmp:          0 kB
>> CommitLimit:    41370148 kB
>> Committed_AS:   39673248 kB
>> VmallocTotal:   34359738367 kB
>> VmallocUsed:      390436 kB
>> VmallocChunk:   34324779316 kB
>> HardwareCorrupted:     0 kB
>> AnonHugePages:   4503552 kB
>> HugePages_Total:       0
>> HugePages_Free:        0
>> HugePages_Rsvd:        0
>> HugePages_Surp:        0
>> Hugepagesize:       2048 kB
>> DirectMap4k:        5504 kB
>> DirectMap2M:     2082816 kB
>> DirectMap1G:    65011712 kB
>>
>> But when I run echo 3 > /proc/sys/vm/drop_caches, I can get 40GB free
>> memory back.
>>
>> [root@wzdx48 ~]# echo 3 > /proc/sys/vm/drop_caches
>> [root@wzdx48 ~]# free -m
>>              total       used       free     shared    buffers     cached
>> Mem:         64417      15566      48850          0         10         59
>> -/+ buffers/cache:      15496      48920
>> Swap:         8191       8191          0
>>
>> I just can't understand where are the 40GB memory used???
>>
>>
>> OSD node  background:
>>
>> [root@localhost ~]# ceph -s
>>      health HEALTH_WARN
>>             too many PGs per OSD (438 > max 300)
>>             noout,nodeep-scrub flag(s) set
>>      monmap e3: 3 mons at
>> {60=192.168.2.60:6789/0,61=192.168.2.61:6789/0,62=192.168.2.62:6789/0}
>>             election epoch 2720, quorum 0,1,2 60,61,62
>>      osdmap e37148: 695 osds: 671 up, 671 in
>>             nodeep-scrub
>>       pgmap v12910815: 98064 pgs, 21 pools, 612 TB data, 757 Mobjects
>>             1862 TB used, 2357 TB / 4220 TB avail
>>                98015 active+clean
>>                   49 active+clean+scrubbing
>>   client io 9114 kB/s rd, 94051 kB/s wr, 6553 op/s
>>
>> [root@wzdx48 ~]# df -i
>> Filesystem        Inodes   IUsed     IFree IUse% Mounted on
>> /dev/sda3       60489728  112915  60376813    1% /
>> tmpfs            8245386      36   8245350    1% /dev/shm
>> /dev/sda1         128016      43    127973    1% /boot
> 11
>>
>> [root@wzdx48 ~]# df -h
>> Filesystem      Size  Used Avail Use% Mounted on
>> /dev/sda3       909G   30G  871G   4% /
>> tmpfs            32G  1.1M   32G   1% /dev/shm
>> /dev/sda1       477M   57M  396M  13% /boot
>> /dev/sdb2       425G  118G  307G  28% /data/osd/osd.660
>> /dev/sdc2       425G  130G  296G  31% /data/osd/osd.661
>> /dev/sdd2       425G  128G  298G  30% /data/osd/osd.662
>> /dev/sde2       425G  125G  301G  30% /data/osd/osd.663
>> /dev/sdf2       425G  134G  292G  32% /data/osd/osd.664
>> /dev/sdg2       425G  131G  294G  31% /data/osd/osd.665
>> /dev/sdh2       425G  131G  295G  31% /data/osd/osd.666
>> /dev/sdi2       425G  124G  302G  30% /data/osd/osd.667
>> /dev/sdj2       425G  126G  299G  30% /data/osd/osd.668
>> /dev/sdk2       425G  123G  302G  29% /data/osd/osd.669
>> /dev/sdl2       131G  351M  130G   1% /data/osd/osd.690
>>
>> There is no active client writing or reading files.
>>
>> top - 10:28:13 up 272 days, 17:43,  1 user,  load average: 0.28, 0.39, 0.44
>> Tasks: 664 total,   1 running, 648 sleeping,   7 stopped,   8 zombie
>> Cpu(s):  0.4%us,  0.7%sy,  0.0%ni, 98.3%id,  0.6%wa,  0.0%hi,  0.0%si,  0.0%st
>> Mem:  65963088k total, 58901700k used,  7061388k free,   117148k buffers
>> Swap:  8388604k total,  8387936k used,      668k free,   457360k cached
>>
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 10166 root      20   0 3726m 1.2g 6096 S  1.7  2.0  13340:55 ceph-osd
>> 10251 root      20   0 3589m 1.2g 6064 S  1.7  2.0  12949:25 ceph-osd
>> 65247 root      20   0 1955m  16m 3424 S  1.7  0.0 164:03.68 ama
>> 10115 root      20   0 3671m 1.2g 6088 S  1.3  2.0  13342:35 ceph-osd
>> 10234 root      20   0 3637m 1.2g 6088 S  1.3  1.9  12848:57 ceph-osd
>> 10200 root      20   0 3707m 1.2g 6092 S  1.0  2.0  13687:07 ceph-osd
>> 10217 root      20   0 3624m 1.2g 6088 S  1.0  1.9  12568:55 ceph-osd
>> 10107 root      20   0 3556m 1.2g 6088 S  0.7  1.9  12198:33 ceph-osd
>> 10132 root      20   0 3643m 1.3g 6088 S  0.7  2.0  12992:18 ceph-osd
>> 10149 root      20   0 3599m 1.2g 6076 S  0.7  2.0  12101:59 ceph-osd
>> 12317 root      20   0 15436 1704  932 R  0.7  0.0   0:00.05 top
>>
>> Appreciate to receive any reply.
>>
>> Best Regards,
>> Brandy
>>
>> --
>> Software Engineer, ChinaNetCenter Co., ShenZhen, Guangdong Province, China
>> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>> --
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
> --
> Robin Hugh Johnson
> Gentoo Linux: Dev, Infra Lead, Foundation Trustee & Treasurer
> E-Mail   : robbat2@xxxxxxxxxx
> GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
> GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux