Appreciate to receive you reply, Robin Hugh. I have also try to adjust the vm configuration before ,but get no effect. Now I will accept your method to do "echo 2 "> drop_caches Thks. 2017-06-12 14:12 GMT+08:00 Robin H. Johnson <robbat2@xxxxxxxxxx>: > On Mon, Jun 12, 2017 at 10:45:52AM +0800, 于相洋 wrote: >> Hi cephers, >> >> I have met a memory problem in ceph rados server nodes. >> >> Total memory size is 64GB and used 56GB, only 8GB is left, cached and >> buffers takes few memory, my swap space is used up, as shown below: >> If free memory is too low, there may occur OOM problem, since swap is >> used up, there may be some performance problems. > It's going to XFS. > > You didn't post the OOM, but this sounds very much like the XFS memory > fragmentation issue as seen here: > https://serverfault.com/questions/642883/cause-of-page-fragmentation-on-large-server-with-xfs-20-disks-and-ceph > > I regularly see it on our systems w/ 36x 6T OSD and 256GB of RAM as seen below, > a dmesg capture from a few days ago. All OSDs are 40-60% full. > > The best mitigation so far is 'echo 2 > /proc/sys/vm/drop_caches' run nightly > during off-peak. The other suggestions in the above link reduced the frequency > of the problem for us, but didn't make it go away. > > Timestamp for all of it: [Thu Jun 8 01:41:59 2017] > ===== > tp_osd_tp invoked oom-killer: gfp_mask=0x240c2c0, order=3, oom_score_adj=0 > tp_osd_tp cpuset=/ mems_allowed=0-1 > CPU: 15 PID: 1085880 Comm: tp_osd_tp Tainted: G W 4.4.0-59-generic #80~14.04.1-Ubuntu > Hardware name: Supermicro SSG-6048R-E1CR36L/X10DRH-iT, BIOS 2.0a 06/30/2016 > 0000000000000000 ffff882a471f3a30 ffffffff813dbd6c ffff882a471f3be8 > 0000000000000000 ffff882a471f3ac0 ffffffff811fafc6 ffff882a471f3be8 > ffff882a471f3af8 ffff8832ad0ac600 0000000000000000 0000000000000000 > Call Trace: > [<ffffffff813dbd6c>] dump_stack+0x63/0x87 > [<ffffffff811fafc6>] dump_header+0x5b/0x1d5 > [<ffffffff81188b35>] oom_kill_process+0x205/0x3d0 > [<ffffffff8118916b>] out_of_memory+0x40b/0x460 > [<ffffffff811fba7f>] __alloc_pages_slowpath.constprop.87+0x742/0x7ad > [<ffffffff8118e167>] __alloc_pages_nodemask+0x237/0x240 > [<ffffffffc03df681>] ? xfs_da_state_free+0x21/0x30 [xfs] > [<ffffffff811d3e18>] alloc_pages_current+0x88/0x120 > [<ffffffff8118ccc9>] alloc_kmem_pages+0x19/0x90 > [<ffffffff811a7868>] kmalloc_order+0x18/0x50 > [<ffffffff811a78c6>] kmalloc_order_trace+0x26/0xb0 > [<ffffffff811df331>] __kmalloc+0x251/0x270 > [<ffffffff812253de>] getxattr+0x8e/0x1b0 > [<ffffffffc04380f5>] ? posix_acl_access_exists+0x15/0x20 [xfs] > [<ffffffffc041e602>] ? xfs_vn_listxattr+0xf2/0x160 [xfs] > [<ffffffff811b5580>] ? handle_mm_fault+0x250/0x540 > [<ffffffff81225dee>] SyS_fgetxattr+0x5e/0xb0 > [<ffffffff81802c76>] entry_SYSCALL_64_fastpath+0x16/0x75 > Mem-Info: > active_anon:8807118 inactive_anon:870763 isolated_anon:0 > active_file:5614956 inactive_file:4123432 isolated_file:0 > unevictable:8 dirty:4323 writeback:0 unstable:0 > slab_reclaimable:1921141 slab_unreclaimable:4002171 > mapped:6716850 shmem:6631 pagetables:82513 bounce:0 > free:758377 free_pcp:2615 free_cma:0 > Node 0 DMA free:15320kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15960kB managed:15832kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes > lowmem_reserve[]: 0 1842 128815 128815 128815 > Node 0 DMA32 free:511832kB min:3744kB low:4680kB high:5616kB active_anon:8kB inactive_anon:8kB active_file:8kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1967272kB managed:1886840kB mlocked:0kB dirty:0kB writeback:0kB mapped:8kB shmem:0kB slab_reclaimable:282060kB slab_unreclaimable:461284kB kernel_stack:13264kB pagetables:1848kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no > lowmem_reserve[]: 0 0 126972 126972 126972 > Node 0 Normal free:915268kB min:258172kB low:322712kB high:387256kB active_anon:19050184kB inactive_anon:1735572kB active_file:12163768kB inactive_file:8400128kB unevictable:32kB isolated(anon):0kB isolated(file):0kB present:132120576kB managed:130020328kB mlocked:32kB dirty:16060kB writeback:0kB mapped:13404324kB shmem:12012kB slab_reclaimable:4971164kB slab_unreclaimable:8497080kB kernel_stack:467504kB pagetables:170296kB unstable:0kB bounce:0kB free_pcp:5476kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:16 all_unreclaimable? no > lowmem_reserve[]: 0 0 0 0 0 > Node 1 Normal free:1591088kB min:262336kB low:327920kB high:393504kB active_anon:16178280kB inactive_anon:1747472kB active_file:10296048kB inactive_file:8093600kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:134217728kB managed:132116736kB mlocked:0kB dirty:1232kB writeback:0kB mapped:13463068kB shmem:14512kB slab_reclaimable:2431340kB slab_unreclaimable:7050320kB kernel_stack:563280kB pagetables:157908kB unstable:0kB bounce:0kB free_pcp:4984kB local_pcp:8kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no > lowmem_reserve[]: 0 0 0 0 0 > Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 0*32kB 1*64kB (U) 1*128kB (U) 1*256kB (U) 1*512kB (U) 0*1024kB 1*2048kB (M) 3*4096kB (M) = 15320kB > Node 0 DMA32: 391*4kB (UME) 278*8kB (UM) 1027*16kB (UME) 683*32kB (UMEH) 504*64kB (UMEH) 396*128kB (UMH) 387*256kB (MEH) 178*512kB (MEH) 44*1024kB (MH) 74*2048kB (MH) 0*4096kB = 511836kB > Node 0 Normal: 52559*4kB (UME) 88630*8kB (UME) 1*16kB (H) 0*32kB 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 919356kB > Node 1 Normal: 127175*4kB (UME) 87936*8kB (UME) 23906*16kB (UMEH) 11*32kB (H) 6*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1595420kB > Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB > Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB > Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB > Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB > 9783120 total pagecache pages > 38289 pages in swap cache > Swap cache stats: add 19895227, delete 19856938, find 11143389/14461125 > Free swap = 7758284kB > Total swap = 8388604kB > 67080384 pages RAM > 0 pages HighMem/MovableOnly > 1070450 pages reserved > 0 pages cma reserved > 0 pages hwpoisoned > [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name > [ 1007] 0 1007 5068 483 13 3 194 0 upstart-udev-br > [ 1013] 0 1013 12887 578 27 3 102 -1000 systemd-udevd > [ 1049] 0 1049 3820 304 13 3 24 0 upstart-file-br > [ 1052] 102 1052 80730 14806 62 4 3742 0 rsyslogd > [ 1798] 0 1798 12164 728 27 3 101 0 lldpd > [ 1869] 105 1869 12164 419 25 3 98 0 lldpd > [ 1879] 0 1879 3816 242 12 3 35 0 upstart-socket- > [ 2750] 103 2750 7866 797 20 3 102 0 ntpd > [ 2999] 0 2999 3635 431 12 3 38 0 getty > [ 3000] 0 3000 3635 448 12 3 37 0 getty > [ 3003] 0 3003 3635 436 12 3 39 0 getty > [ 3004] 0 3004 3635 435 12 3 40 0 getty > [ 3006] 0 3006 3635 450 12 3 39 0 getty > [ 3022] 0 3022 15346 927 34 3 140 -1000 sshd > [ 3024] 0 3024 5914 543 17 3 40 0 cron > [ 3224] 0 3224 1083 209 8 3 22 0 collectdmon > [ 3225] 0 3225 195966 1083 47 4 42 0 collectd > [ 3253] 0 3253 46081 1709 24 3 1019 0 fail2ban-server > [ 3408] 0 3408 6336 667 16 3 49 0 master > [ 3419] 104 3419 6893 698 18 3 44 0 qmgr > [ 3476] 0 3476 3318 282 10 3 24 0 mdadm > [ 3509] 0 3509 3635 439 12 3 37 0 getty > [ 3510] 0 3510 3197 439 12 3 35 0 getty > [ 3511] 0 3511 3197 445 12 3 34 0 getty > [2021121] 106 2021121 5835 474 16 3 123 0 nrpe > [1061740] 0 1061740 1193840 428081 2126 8 1069 0 ceph-osd > [1062045] 0 1062045 1580279 528454 3160 10 1199 0 ceph-osd > [1062547] 0 1062547 1051761 370552 1826 7 1870 0 ceph-osd > [1062915] 0 1062915 1174510 411056 2062 8 1590 0 ceph-osd > [1063396] 0 1063396 1400646 581974 2551 8 905 0 ceph-osd > [1064669] 0 1064669 1231068 386831 2184 7 767 0 ceph-osd > [1064973] 0 1064973 1358184 428018 2480 8 831 0 ceph-osd > [1065390] 0 1065390 1205864 439471 2121 9 1399 0 ceph-osd > [1065609] 0 1065609 1302914 479849 2331 8 698 0 ceph-osd > [1065968] 0 1065968 1376198 481664 2471 8 543 0 ceph-osd > [1066275] 0 1066275 1225083 439472 2185 8 810 0 ceph-osd > [1066575] 0 1066575 1285168 446490 2272 8 721 0 ceph-osd > [1066876] 0 1066876 1275062 448917 2278 8 5928 0 ceph-osd > [1067225] 0 1067225 1142918 402708 1991 7 966 0 ceph-osd > [1067581] 0 1067581 1084617 390226 1900 8 1192 0 ceph-osd > [1067867] 0 1067867 1306584 465829 2324 8 1140 0 ceph-osd > [1068359] 0 1068359 1143859 419061 2038 8 486 0 ceph-osd > [1068712] 0 1068712 1356145 482163 2482 8 703 0 ceph-osd > [1068945] 0 1068945 1464922 511993 2684 10 1054 0 ceph-osd > [1069202] 0 1069202 1314611 466149 2343 8 373 0 ceph-osd > [1077729] 0 1077729 1236855 474960 2196 8 2141 0 ceph-osd > [1077994] 0 1077994 1343678 511317 2422 8 3687 0 ceph-osd > [1078712] 0 1078712 1305742 547914 2328 8 14576 0 ceph-osd > [1079898] 0 1079898 1095581 443459 1913 7 1961 0 ceph-osd > [1081804] 0 1081804 1032092 369817 1789 7 6281 0 ceph-osd > [1082066] 0 1082066 1561346 536779 2734 10 7147 0 ceph-osd > [1083961] 0 1083961 1134121 445427 1976 7 20826 0 ceph-osd > [1086089] 0 1086089 1273552 473015 2271 8 4362 0 ceph-osd > [1088670] 0 1088670 1114051 402725 1973 7 8050 0 ceph-osd > [1092038] 0 1092038 1125645 435613 1976 7 9110 0 ceph-osd > [1096756] 0 1096756 1298374 431037 2313 8 3579 0 ceph-osd > [1097216] 0 1097216 1287326 460129 2289 8 8807 0 ceph-osd > [1101156] 0 1101156 1175688 429388 2065 8 7705 0 ceph-osd > [1107340] 0 1107340 1428037 468276 2626 10 3232 0 ceph-osd > [1107953] 0 1107953 1256050 459764 2239 8 2232 0 ceph-osd > [2432806] 0 2432806 1533549 440887 2734 10 2175 0 ceph-osd > [507551] 0 507551 28175 9661 60 3 108 0 ruby > [3159966] 999 3159966 91561 54449 141 3 978 1000 netdata > [3159992] 999 3159992 25706 4617 40 3 0 1000 python > [3615506] 999 3615506 18141 3880 29 3 0 1000 apps.plugin > [3644773] 104 3644773 6852 701 18 3 0 0 pickup > [3703623] 999 3703623 4572 820 14 3 0 1000 bash > [3709023] 104 3709023 6852 708 17 3 0 0 showq > Out of memory: Kill process 3159966 (netdata) score 1000 or sacrifice child > Killed process 3159992 (python) total-vm:102824kB, anon-rss:11528kB, file-rss:6940kB > ===== > > > >> >> [root@localhost ~]# free -m >> total used free >> shared buffers cached >> Mem: 64417 56768 7648 0 >> 114 443 >> -/+ buffers/cache: 56211 8206 >> Swap: 8191 8191 0 >> >> >From the /proc/meminfo, slab reclaim takes only 2GB memory, >> >> [root@wzdx48 ~]# cat /proc/meminfo >> MemTotal: 65963088 kB >> MemFree: 7750100 kB >> Buffers: 116776 kB >> Cached: 453988 kB >> SwapCached: 813692 kB >> Active: 12835884 kB >> Inactive: 2184952 kB >> Active(anon): 12480640 kB >> Inactive(anon): 1971280 kB >> Active(file): 355244 kB >> Inactive(file): 213672 kB >> Unevictable: 0 kB >> Mlocked: 0 kB >> SwapTotal: 8388604 kB >> SwapFree: 128 kB >> Dirty: 928 kB >> Writeback: 0 kB >> AnonPages: 13636556 kB >> Mapped: 38184 kB >> Shmem: 1840 kB >> Slab: 6074272 kB >> SReclaimable: 2310640 kB >> SUnreclaim: 3763632 kB >> KernelStack: 42936 kB >> PageTables: 71748 kB >> NFS_Unstable: 0 kB >> Bounce: 0 kB >> WritebackTmp: 0 kB >> CommitLimit: 41370148 kB >> Committed_AS: 39673248 kB >> VmallocTotal: 34359738367 kB >> VmallocUsed: 390436 kB >> VmallocChunk: 34324779316 kB >> HardwareCorrupted: 0 kB >> AnonHugePages: 4503552 kB >> HugePages_Total: 0 >> HugePages_Free: 0 >> HugePages_Rsvd: 0 >> HugePages_Surp: 0 >> Hugepagesize: 2048 kB >> DirectMap4k: 5504 kB >> DirectMap2M: 2082816 kB >> DirectMap1G: 65011712 kB >> >> But when I run echo 3 > /proc/sys/vm/drop_caches, I can get 40GB free >> memory back. >> >> [root@wzdx48 ~]# echo 3 > /proc/sys/vm/drop_caches >> [root@wzdx48 ~]# free -m >> total used free shared buffers cached >> Mem: 64417 15566 48850 0 10 59 >> -/+ buffers/cache: 15496 48920 >> Swap: 8191 8191 0 >> >> I just can't understand where are the 40GB memory used??? >> >> >> OSD node background: >> >> [root@localhost ~]# ceph -s >> health HEALTH_WARN >> too many PGs per OSD (438 > max 300) >> noout,nodeep-scrub flag(s) set >> monmap e3: 3 mons at >> {60=192.168.2.60:6789/0,61=192.168.2.61:6789/0,62=192.168.2.62:6789/0} >> election epoch 2720, quorum 0,1,2 60,61,62 >> osdmap e37148: 695 osds: 671 up, 671 in >> nodeep-scrub >> pgmap v12910815: 98064 pgs, 21 pools, 612 TB data, 757 Mobjects >> 1862 TB used, 2357 TB / 4220 TB avail >> 98015 active+clean >> 49 active+clean+scrubbing >> client io 9114 kB/s rd, 94051 kB/s wr, 6553 op/s >> >> [root@wzdx48 ~]# df -i >> Filesystem Inodes IUsed IFree IUse% Mounted on >> /dev/sda3 60489728 112915 60376813 1% / >> tmpfs 8245386 36 8245350 1% /dev/shm >> /dev/sda1 128016 43 127973 1% /boot > 11 >> >> [root@wzdx48 ~]# df -h >> Filesystem Size Used Avail Use% Mounted on >> /dev/sda3 909G 30G 871G 4% / >> tmpfs 32G 1.1M 32G 1% /dev/shm >> /dev/sda1 477M 57M 396M 13% /boot >> /dev/sdb2 425G 118G 307G 28% /data/osd/osd.660 >> /dev/sdc2 425G 130G 296G 31% /data/osd/osd.661 >> /dev/sdd2 425G 128G 298G 30% /data/osd/osd.662 >> /dev/sde2 425G 125G 301G 30% /data/osd/osd.663 >> /dev/sdf2 425G 134G 292G 32% /data/osd/osd.664 >> /dev/sdg2 425G 131G 294G 31% /data/osd/osd.665 >> /dev/sdh2 425G 131G 295G 31% /data/osd/osd.666 >> /dev/sdi2 425G 124G 302G 30% /data/osd/osd.667 >> /dev/sdj2 425G 126G 299G 30% /data/osd/osd.668 >> /dev/sdk2 425G 123G 302G 29% /data/osd/osd.669 >> /dev/sdl2 131G 351M 130G 1% /data/osd/osd.690 >> >> There is no active client writing or reading files. >> >> top - 10:28:13 up 272 days, 17:43, 1 user, load average: 0.28, 0.39, 0.44 >> Tasks: 664 total, 1 running, 648 sleeping, 7 stopped, 8 zombie >> Cpu(s): 0.4%us, 0.7%sy, 0.0%ni, 98.3%id, 0.6%wa, 0.0%hi, 0.0%si, 0.0%st >> Mem: 65963088k total, 58901700k used, 7061388k free, 117148k buffers >> Swap: 8388604k total, 8387936k used, 668k free, 457360k cached >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 10166 root 20 0 3726m 1.2g 6096 S 1.7 2.0 13340:55 ceph-osd >> 10251 root 20 0 3589m 1.2g 6064 S 1.7 2.0 12949:25 ceph-osd >> 65247 root 20 0 1955m 16m 3424 S 1.7 0.0 164:03.68 ama >> 10115 root 20 0 3671m 1.2g 6088 S 1.3 2.0 13342:35 ceph-osd >> 10234 root 20 0 3637m 1.2g 6088 S 1.3 1.9 12848:57 ceph-osd >> 10200 root 20 0 3707m 1.2g 6092 S 1.0 2.0 13687:07 ceph-osd >> 10217 root 20 0 3624m 1.2g 6088 S 1.0 1.9 12568:55 ceph-osd >> 10107 root 20 0 3556m 1.2g 6088 S 0.7 1.9 12198:33 ceph-osd >> 10132 root 20 0 3643m 1.3g 6088 S 0.7 2.0 12992:18 ceph-osd >> 10149 root 20 0 3599m 1.2g 6076 S 0.7 2.0 12101:59 ceph-osd >> 12317 root 20 0 15436 1704 932 R 0.7 0.0 0:00.05 top >> >> Appreciate to receive any reply. >> >> Best Regards, >> Brandy >> >> -- >> Software Engineer, ChinaNetCenter Co., ShenZhen, Guangdong Province, China >> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde >> -- >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > -- > Robin Hugh Johnson > Gentoo Linux: Dev, Infra Lead, Foundation Trustee & Treasurer > E-Mail : robbat2@xxxxxxxxxx > GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 > GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html