Re: [PATCH rfc 0/5] mm: introduce shrinker sysfs interface

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 15, 2022 at 5:28 PM Roman Gushchin <roman.gushchin@xxxxxxxxx> wrote:
>
> There are 50+ different shrinkers in the kernel, many with their own bells and
> whistles. Under the memory pressure the kernel applies some pressure on each of
> them in the order of which they were created/registered in the system. Some
> of them can contain only few objects, some can be quite large. Some can be
> effective at reclaiming memory, some not.
>
> The only existing debugging mechanism is a couple of tracepoints in
> do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They aren't
> covering everything though: shrinkers which report 0 objects will never show up,
> there is no support for memcg-aware shrinkers. Shrinkers are identified by their
> scan function, which is not always enough (e.g. hard to guess which super
> block's shrinker it is having only "super_cache_scan"). They are a passive
> mechanism: there is no way to call into counting and scanning of an individual
> shrinker and profile it.
>
> To provide a better visibility and debug options for memory shrinkers
> this patchset introduces a /sys/kernel/shrinker interface, to some extent
> similar to /sys/kernel/slab.
>
> For each shrinker registered in the system a folder is created. The folder
> contains "count" and "scan" files, which allow to trigger count_objects()
> and scan_objects() callbacks. For memcg-aware and numa-aware shrinkers
> count_memcg, scan_memcg, count_node, scan_node, count_memcg_node
> and scan_memcg_node are additionally provided. They allow to get per-memcg
> and/or per-node object count and shrink only a specific memcg/node.
>
> To make debugging more pleasant, the patchset also names all shrinkers,
> so that sysfs entries can have more meaningful names.
>
> Usage examples:

Thanks, Roman. A follow-up question, why do we have to implement this
in kernel if we just count the objects? It seems userspace tools could
achieve it too, for example, drgn :-). Actually I did write a drgn
script for debugging a problem a few months ago, which iterates
specific memcg's lru_list to count the objects by their state.

>
> 1) List registered shrinkers:
>   $ cd /sys/kernel/shrinker/
>   $ ls
>     dqcache-16          sb-cgroup2-30    sb-hugetlbfs-33  sb-proc-41       sb-selinuxfs-22  sb-tmpfs-40    sb-zsmalloc-19
>     kfree_rcu-0         sb-configfs-23   sb-iomem-12      sb-proc-44       sb-sockfs-8      sb-tmpfs-42    shadow-18
>     sb-aio-20           sb-dax-11        sb-mqueue-21     sb-proc-45       sb-sysfs-26      sb-tmpfs-43    thp_deferred_split-10
>     sb-anon_inodefs-15  sb-debugfs-7     sb-nsfs-4        sb-proc-47       sb-tmpfs-1       sb-tmpfs-46    thp_zero-9
>     sb-bdev-3           sb-devpts-28     sb-pipefs-14     sb-pstore-31     sb-tmpfs-27      sb-tmpfs-49    xfs_buf-37
>     sb-bpf-32           sb-devtmpfs-5    sb-proc-25       sb-rootfs-2      sb-tmpfs-29      sb-tracefs-13  xfs_inodegc-38
>     sb-btrfs-24         sb-hugetlbfs-17  sb-proc-39       sb-securityfs-6  sb-tmpfs-35      sb-xfs-36      zspool-34
>
> 2) Get information about a specific shrinker:
>   $ cd sb-btrfs-24/
>   $ ls
>     count  count_memcg  count_memcg_node  count_node  scan  scan_memcg  scan_memcg_node  scan_node
>
> 3) Count objects on the system/root cgroup level
>   $ cat count
>     212
>
> 4) Count objects on the system/root cgroup level per numa node (on a 2-node machine)
>   $ cat count_node
>     209 3
>
> 5) Count objects for each memcg (output format: cgroup inode, count)
>   $ cat count_memcg
>     1 212
>     20 96
>     53 817
>     2297 2
>     218 13
>     581 30
>     911 124
>     <CUT>
>
> 6) Same but with a per-node output
>   $ cat count_memcg_node
>     1 209 3
>     20 96 0
>     53 810 7
>     2297 2 0
>     218 13 0
>     581 30 0
>     911 124 0
>     <CUT>
>
> 7) Don't display cgroups with less than 500 attached objects
>   $ echo 500 > count_memcg
>   $ cat count_memcg
>     53 817
>     1868 886
>     2396 799
>     2462 861
>
> 8) Don't display cgroups with less than 500 attached objects (sum over all nodes)
>   $ echo "500" > count_memcg_node
>   $ cat count_memcg_node
>     53 810 7
>     1868 886 0
>     2396 799 0
>     2462 861 0
>
> 9) Scan system/root shrinker
>   $ cat count
>     212
>   $ echo 100 > scan
>   $ cat scan
>     97
>   $ cat count
>     115
>
> 10) Scan individual memcg
>   $ echo "1868 500" > scan_memcg
>   $ cat scan_memcg
>     193
>
> 11) Scan individual node
>   $ echo "1 200" > scan_node
>   $ cat scan_node
>     2
>
> 12) Scan individual memcg and node
>   $ echo "1868 0 500" > scan_memcg_node
>   $ cat scan_memcg_node
>     435
>
> If the output doesn't fit into a single page, "...\n" is printed at the end of
> output.
>
>
> Roman Gushchin (5):
>   mm: introduce sysfs interface for debugging kernel shrinker
>   mm: memcontrol: introduce mem_cgroup_ino() and
>     mem_cgroup_get_from_ino()
>   mm: introduce memcg interfaces for shrinker sysfs
>   mm: introduce numa interfaces for shrinker sysfs
>   mm: provide shrinkers with names
>
>  arch/x86/kvm/mmu/mmu.c                        |   2 +-
>  drivers/android/binder_alloc.c                |   2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_shrinker.c  |   3 +-
>  drivers/gpu/drm/msm/msm_gem_shrinker.c        |   2 +-
>  .../gpu/drm/panfrost/panfrost_gem_shrinker.c  |   2 +-
>  drivers/gpu/drm/ttm/ttm_pool.c                |   2 +-
>  drivers/md/bcache/btree.c                     |   2 +-
>  drivers/md/dm-bufio.c                         |   2 +-
>  drivers/md/dm-zoned-metadata.c                |   2 +-
>  drivers/md/raid5.c                            |   2 +-
>  drivers/misc/vmw_balloon.c                    |   2 +-
>  drivers/virtio/virtio_balloon.c               |   2 +-
>  drivers/xen/xenbus/xenbus_probe_backend.c     |   2 +-
>  fs/erofs/utils.c                              |   2 +-
>  fs/ext4/extents_status.c                      |   3 +-
>  fs/f2fs/super.c                               |   2 +-
>  fs/gfs2/glock.c                               |   2 +-
>  fs/gfs2/main.c                                |   2 +-
>  fs/jbd2/journal.c                             |   2 +-
>  fs/mbcache.c                                  |   2 +-
>  fs/nfs/nfs42xattr.c                           |   7 +-
>  fs/nfs/super.c                                |   2 +-
>  fs/nfsd/filecache.c                           |   2 +-
>  fs/nfsd/nfscache.c                            |   2 +-
>  fs/quota/dquot.c                              |   2 +-
>  fs/super.c                                    |   2 +-
>  fs/ubifs/super.c                              |   2 +-
>  fs/xfs/xfs_buf.c                              |   2 +-
>  fs/xfs/xfs_icache.c                           |   2 +-
>  fs/xfs/xfs_qm.c                               |   2 +-
>  include/linux/memcontrol.h                    |   9 +
>  include/linux/shrinker.h                      |  25 +-
>  kernel/rcu/tree.c                             |   2 +-
>  lib/Kconfig.debug                             |   9 +
>  mm/Makefile                                   |   1 +
>  mm/huge_memory.c                              |   4 +-
>  mm/memcontrol.c                               |  23 +
>  mm/shrinker_debug.c                           | 792 ++++++++++++++++++
>  mm/vmscan.c                                   |  66 +-
>  mm/workingset.c                               |   2 +-
>  mm/zsmalloc.c                                 |   2 +-
>  net/sunrpc/auth.c                             |   2 +-
>  42 files changed, 957 insertions(+), 47 deletions(-)
>  create mode 100644 mm/shrinker_debug.c
>
> --
> 2.35.1
>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux