On Thu, Apr 23, 2020 at 9:50 AM Bin <anole1949@xxxxxxxxx> wrote: > Dear Ondrej: > > I've added "slab_nomerge" in the kernel parameters, and after observation for couple of days, I got this: > > > Active / Total Objects (% used) : 83818306 / 84191607 (99.6%) > Active / Total Slabs (% used) : 1336293 / 1336293 (100.0%) > Active / Total Caches (% used) : 152 / 217 (70.0%) > Active / Total Size (% used) : 5828768.08K / 5996848.72K (97.2%) > Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K > > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME > 80253888 80253888 100% 0.06K 1253967 64 5015868K iommu_iova Well, that means the leak is caused by the "iommu_iova" kmem cache and has nothing to with SELinux. I'd try luck on the iommu mailing list: iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx > 489472 489123 99% 0.03K 3824 128 15296K kmalloc-32 > 297444 271112 91% 0.19K 7082 42 56656K dentry > 254400 252784 99% 0.06K 3975 64 15900K anon_vma_chain > 222528 39255 17% 0.50K 6954 32 111264K kmalloc-512 > 202482 201814 99% 0.19K 4821 42 38568K vm_area_struct > 200192 200192 100% 0.01K 391 512 1564K kmalloc-8 > 170528 169359 99% 0.25K 5329 32 42632K filp > 158144 153508 97% 0.06K 2471 64 9884K kmalloc-64 > 149914 149365 99% 0.09K 3259 46 13036K anon_vma > 146640 143123 97% 0.10K 3760 39 15040K buffer_head > 130368 32791 25% 0.09K 3104 42 12416K kmalloc-96 > 129752 129752 100% 0.07K 2317 56 9268K Acpi-Operand > 105468 105106 99% 0.04K 1034 102 4136K selinux_inode_security > 73080 73080 100% 0.13K 2436 30 9744K kernfs_node_cache > 72360 70261 97% 0.59K 1340 54 42880K inode_cache > 71040 71040 100% 0.12K 2220 32 8880K eventpoll_epi > 68096 59262 87% 0.02K 266 256 1064K kmalloc-16 > 53652 53652 100% 0.04K 526 102 2104K pde_opener > 50496 31654 62% 2.00K 3156 16 100992K kmalloc-2048 > 46242 46242 100% 0.19K 1101 42 8808K cred_jar > 44496 43013 96% 0.66K 927 48 29664K proc_inode_cache > 44352 44352 100% 0.06K 693 64 2772K task_delay_info > 43516 43471 99% 0.69K 946 46 30272K sock_inode_cache > 37856 27626 72% 1.00K 1183 32 37856K kmalloc-1024 > 36736 36736 100% 0.07K 656 56 2624K eventpoll_pwq > 34076 31282 91% 0.57K 1217 28 19472K radix_tree_node > 33660 30528 90% 1.05K 1122 30 35904K ext4_inode_cache > 32760 30959 94% 0.19K 780 42 6240K kmalloc-192 > 32028 32028 100% 0.04K 314 102 1256K ext4_extent_status > 30048 30048 100% 0.25K 939 32 7512K skbuff_head_cache > 28736 28736 100% 0.06K 449 64 1796K fs_cache > 24702 24702 100% 0.69K 537 46 17184K files_cache > 23808 23808 100% 0.66K 496 48 15872K ovl_inode > 23104 22945 99% 0.12K 722 32 2888K kmalloc-128 > 22724 21307 93% 0.69K 494 46 15808K shmem_inode_cache > 21472 21472 100% 0.12K 671 32 2684K seq_file > 19904 19904 100% 1.00K 622 32 19904K UNIX > 17340 17340 100% 1.06K 578 30 18496K mm_struct > 15980 15980 100% 0.02K 94 170 376K avtab_node > 14070 14070 100% 1.06K 469 30 15008K signal_cache > 13248 13248 100% 0.12K 414 32 1656K pid > 12128 11777 97% 0.25K 379 32 3032K kmalloc-256 > 11008 11008 100% 0.02K 43 256 172K selinux_file_security > 10812 10812 100% 0.04K 106 102 424K Acpi-Namespace > > Is these info ring any bell for you? > > Ondrej Mosnacek <omosnace@xxxxxxxxxx> 于2020年4月15日周三 下午9:44写道: >> >> On Wed, Apr 15, 2020 at 2:31 PM 郭彬 <anole1949@xxxxxxxxx> wrote: >> > I'm running a batch of CoreOS boxes, the lsb_release is: >> > >> > ``` >> > # cat /etc/lsb-release >> > DISTRIB_ID="Container Linux by CoreOS" >> > DISTRIB_RELEASE=2303.3.0 >> > DISTRIB_CODENAME="Rhyolite" >> > DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 (Rhyolite)" >> > ``` >> > >> > ``` >> > # uname -a >> > Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00 2019 >> > x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel GNU/Linux >> > ``` >> > Recently, I found my vms constently being killed due to OOM, and after >> > digging into the problem, I finally realized that the kernel is leaking >> > memory. >> > >> > Here's my slabinfo: >> > >> > ``` >> > # slabtop --sort c -o >> > Active / Total Objects (% used) : 739390584 / 740008326 (99.9%) >> > Active / Total Slabs (% used) : 11594275 / 11594275 (100.0%) >> > Active / Total Caches (% used) : 105 / 129 (81.4%) >> > Active / Total Size (% used) : 47121380.33K / 47376581.93K (99.5%) >> > Minimum / Average / Maximum Object : 0.01K / 0.06K / 8.00K >> > >> > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME >> > 734506368 734506368 100% 0.06K 11476662 64 45906648K ebitmap_node >> [...] >> > You can see that the `ebitmap_node` is over 40GB and still growing. The >> > only thing I can do is rebooting the OS, but there are tens of them and >> > lots of workloads running on them, I can't just reboot whenever I want. >> > So, I run out of options, any help? >> >> Pasting in relevant comments/questions from [1]: >> >> 2. Your kernel seems to be quite behind the current upstream and is >> probably maintained by your distribution (seems to be derived from the >> 4.19 stable branch). Can you reproduce the issue on a more recent >> kernel (at least 5.5+)? If you can't or the recent kernel doesn't >> exhibit the issue, then you should report this to your distribution. >> 3. Was this working fine with some earlier kernel? If you can >> determine the last working version, then it could help us identify the >> cause and/or the fix. >> >> On top of that, I realized one more thing - the kernel merges the >> caches for objects of the same size - so any cache with object size 64 >> bytes will be accounted under 'ebitmap_node' here. For example, on my >> system there are several caches that all alias to the common 64-byte >> cache: >> # ls -l /sys/kernel/slab/ | grep -- '-> :0000064' >> lrwxrwxrwx. 1 root root 0 apr 15 15:26 dmaengine-unmap-2 -> :0000064 >> lrwxrwxrwx. 1 root root 0 apr 15 15:26 ebitmap_node -> :0000064 >> lrwxrwxrwx. 1 root root 0 apr 15 15:26 fanotify_event -> :0000064 >> lrwxrwxrwx. 1 root root 0 apr 15 15:26 io -> :0000064 >> lrwxrwxrwx. 1 root root 0 apr 15 15:26 iommu_iova -> :0000064 >> lrwxrwxrwx. 1 root root 0 apr 15 15:26 jbd2_inode -> :0000064 >> lrwxrwxrwx. 1 root root 0 apr 15 15:26 ksm_rmap_item -> :0000064 >> lrwxrwxrwx. 1 root root 0 apr 15 15:26 ksm_stable_node -> :0000064 >> lrwxrwxrwx. 1 root root 0 apr 15 15:26 vmap_area -> :0000064 >> >> On your kernel you might get a different list, but any of the caches >> you get could be the culprit, ebitmap_node is just one of the >> possibilities. You can disable this merging by adding "slab_nomerge" >> to your kernel boot command-line. That will allow you to identify >> which cache is really the source of the leak. >> >> [1] https://github.com/SELinuxProject/selinux/issues/220#issuecomment-613944748 >> >> -- >> Ondrej Mosnacek <omosnace at redhat dot com> >> Software Engineer, Security Technologies >> Red Hat, Inc. >> -- Ondrej Mosnacek <omosnace at redhat dot com> Software Engineer, Security Technologies Red Hat, Inc.