On Wed, Apr 15, 2020 at 2:31 PM 郭彬 <anole1949@xxxxxxxxx> wrote: > I'm running a batch of CoreOS boxes, the lsb_release is: > > ``` > # cat /etc/lsb-release > DISTRIB_ID="Container Linux by CoreOS" > DISTRIB_RELEASE=2303.3.0 > DISTRIB_CODENAME="Rhyolite" > DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 (Rhyolite)" > ``` > > ``` > # uname -a > Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00 2019 > x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel GNU/Linux > ``` > Recently, I found my vms constently being killed due to OOM, and after > digging into the problem, I finally realized that the kernel is leaking > memory. > > Here's my slabinfo: > > ``` > # slabtop --sort c -o > Active / Total Objects (% used) : 739390584 / 740008326 (99.9%) > Active / Total Slabs (% used) : 11594275 / 11594275 (100.0%) > Active / Total Caches (% used) : 105 / 129 (81.4%) > Active / Total Size (% used) : 47121380.33K / 47376581.93K (99.5%) > Minimum / Average / Maximum Object : 0.01K / 0.06K / 8.00K > > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME > 734506368 734506368 100% 0.06K 11476662 64 45906648K ebitmap_node [...] > You can see that the `ebitmap_node` is over 40GB and still growing. The > only thing I can do is rebooting the OS, but there are tens of them and > lots of workloads running on them, I can't just reboot whenever I want. > So, I run out of options, any help? Pasting in relevant comments/questions from [1]: 2. Your kernel seems to be quite behind the current upstream and is probably maintained by your distribution (seems to be derived from the 4.19 stable branch). Can you reproduce the issue on a more recent kernel (at least 5.5+)? If you can't or the recent kernel doesn't exhibit the issue, then you should report this to your distribution. 3. Was this working fine with some earlier kernel? If you can determine the last working version, then it could help us identify the cause and/or the fix. On top of that, I realized one more thing - the kernel merges the caches for objects of the same size - so any cache with object size 64 bytes will be accounted under 'ebitmap_node' here. For example, on my system there are several caches that all alias to the common 64-byte cache: # ls -l /sys/kernel/slab/ | grep -- '-> :0000064' lrwxrwxrwx. 1 root root 0 apr 15 15:26 dmaengine-unmap-2 -> :0000064 lrwxrwxrwx. 1 root root 0 apr 15 15:26 ebitmap_node -> :0000064 lrwxrwxrwx. 1 root root 0 apr 15 15:26 fanotify_event -> :0000064 lrwxrwxrwx. 1 root root 0 apr 15 15:26 io -> :0000064 lrwxrwxrwx. 1 root root 0 apr 15 15:26 iommu_iova -> :0000064 lrwxrwxrwx. 1 root root 0 apr 15 15:26 jbd2_inode -> :0000064 lrwxrwxrwx. 1 root root 0 apr 15 15:26 ksm_rmap_item -> :0000064 lrwxrwxrwx. 1 root root 0 apr 15 15:26 ksm_stable_node -> :0000064 lrwxrwxrwx. 1 root root 0 apr 15 15:26 vmap_area -> :0000064 On your kernel you might get a different list, but any of the caches you get could be the culprit, ebitmap_node is just one of the possibilities. You can disable this merging by adding "slab_nomerge" to your kernel boot command-line. That will allow you to identify which cache is really the source of the leak. [1] https://github.com/SELinuxProject/selinux/issues/220#issuecomment-613944748 -- Ondrej Mosnacek <omosnace at redhat dot com> Software Engineer, Security Technologies Red Hat, Inc.