[PATCH v3 00/35] Memory allocation profiling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Memory allocation, v3 and final:

Overview:
Low overhead [1] per-callsite memory allocation profiling. Not just for debug
kernels, overhead low enough to be deployed in production.

We're aiming to get this in the next merge window, for 6.9. The feedback
we've gotten has been that even out of tree this patchset has already
been useful, and there's a significant amount of other work gated on the
code tagging functionality included in this patchset [2].

Example output:
  root@moria-kvm:~# sort -h /proc/allocinfo|tail
   3.11MiB     2850 fs/ext4/super.c:1408 module:ext4 func:ext4_alloc_inode
   3.52MiB      225 kernel/fork.c:356 module:fork func:alloc_thread_stack_node
   3.75MiB      960 mm/page_ext.c:270 module:page_ext func:alloc_page_ext
   4.00MiB        2 mm/khugepaged.c:893 module:khugepaged func:hpage_collapse_alloc_folio
   10.5MiB      168 block/blk-mq.c:3421 module:blk_mq func:blk_mq_alloc_rqs
   14.0MiB     3594 include/linux/gfp.h:295 module:filemap func:folio_alloc_noprof
   26.8MiB     6856 include/linux/gfp.h:295 module:memory func:folio_alloc_noprof
   64.5MiB    98315 fs/xfs/xfs_rmap_item.c:147 module:xfs func:xfs_rui_init
   98.7MiB    25264 include/linux/gfp.h:295 module:readahead func:folio_alloc_noprof
    125MiB     7357 mm/slub.c:2201 module:slub func:alloc_slab_page

Since v2:
 - tglx noticed a circular header dependency between sched.h and percpu.h;
   a bunch of header cleanups were merged into 6.8 to ameliorate this [3].

 - a number of improvements, moving alloc_hooks() annotations to the
   correct place for better tracking (mempool), and bugfixes.

 - looked at alternate hooking methods.
   There were suggestions on alternate methods (compiler attribute,
   trampolines), but they wouldn't have made the patchset any cleaner
   (we still need to have different function versions for accounting vs. no
   accounting to control at which point in a call chain the accounting
   happens), and they would have added a dependency on toolchain
   support.

Usage:
kconfig options:
 - CONFIG_MEM_ALLOC_PROFILING
 - CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
 - CONFIG_MEM_ALLOC_PROFILING_DEBUG
   adds warnings for allocations that weren't accounted because of a
   missing annotation

sysctl:
  /proc/sys/vm/mem_profiling

Runtime info:
  /proc/allocinfo

Notes:

[1]: Overhead
To measure the overhead we are comparing the following configurations:
(1) Baseline with CONFIG_MEMCG_KMEM=n
(2) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
    CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n)
(3) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
    CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y)
(4) Enabled at runtime (CONFIG_MEM_ALLOC_PROFILING=y &&
    CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n && /proc/sys/vm/mem_profiling=1)
(5) Baseline with CONFIG_MEMCG_KMEM=y && allocating with __GFP_ACCOUNT

Performance overhead:
To evaluate performance we implemented an in-kernel test executing
multiple get_free_page/free_page and kmalloc/kfree calls with allocation
sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU
affinity set to a specific CPU to minimize the noise. Below are results
from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on
56 core Intel Xeon:

                        kmalloc                 pgalloc
(1 baseline)            6.764s                  16.902s
(2 default disabled)    6.793s (+0.43%)         17.007s (+0.62%)
(3 default enabled)     7.197s (+6.40%)         23.666s (+40.02%)
(4 runtime enabled)     7.405s (+9.48%)         23.901s (+41.41%)
(5 memcg)               13.388s (+97.94%)       48.460s (+186.71%)

Memory overhead:
Kernel size:

   text           data        bss         dec         diff
(1) 26515311	      18890222    17018880    62424413
(2) 26524728	      19423818    16740352    62688898    264485
(3) 26524724	      19423818    16740352    62688894    264481
(4) 26524728	      19423818    16740352    62688898    264485
(5) 26541782	      18964374    16957440    62463596    39183

Memory consumption on a 56 core Intel CPU with 125GB of memory:
Code tags:           192 kB
PageExts:         262144 kB (256MB)
SlabExts:           9876 kB (9.6MB)
PcpuExts:            512 kB (0.5MB)

Total overhead is 0.2% of total memory.

[2]: Improved fault injection is the big one; the alloc_hooks() macro
this patchset introduces is also used for per-callsite fault injection
points in the dynamic fault injection patchset, which means we can
easily do fault injection on a per module or per file basis; this makes
it much easier to integrate memory fault injection into existing tests.

Vlastimil recently raised concerns about exposing GFP_NOWAIT as a
PF_MEMALLOC_* flag, as this might introduce GFP_NOWAIT to allocation
paths that have never had their failure paths tested - this is something
we need to address.

[3]: The circular dependency looks to be unavoidable; the issue is that
alloc_tag_save() -> current -> get_current() requires percpu.h, and
percpu.h requires sched.h because of course it does. But this doesn't
actually cause build errors because we're only using macros, so the main
concern is just not leaving a difficult-to-disentangle minefield for
later.
So, sched.h is now pretty close to being a types only header that
imports types and declares types - this is the header cleanups that were
merged for 6.8.


Kent Overstreet (11):
  lib/string_helpers: Add flags param to string_get_size()
  scripts/kallysms: Always include __start and __stop symbols
  fs: Convert alloc_inode_sb() to a macro
  mm/slub: Mark slab_free_freelist_hook() __always_inline
  mempool: Hook up to memory allocation profiling
  xfs: Memory allocation profiling fixups
  mm: percpu: Introduce pcpuobj_ext
  mm: percpu: Add codetag reference into pcpuobj_ext
  mm: vmalloc: Enable memory allocation profiling
  rhashtable: Plumb through alloc tag
  MAINTAINERS: Add entries for code tagging and memory allocation
    profiling

Suren Baghdasaryan (24):
  mm: enumerate all gfp flags
  mm: introduce slabobj_ext to support slab object extensions
  mm: introduce __GFP_NO_OBJ_EXT flag to selectively prevent slabobj_ext
    creation
  mm/slab: introduce SLAB_NO_OBJ_EXT to avoid obj_ext creation
  mm: prevent slabobj_ext allocations for slabobj_ext and kmem_cache
    objects
  slab: objext: introduce objext_flags as extension to
    page_memcg_data_flags
  lib: code tagging framework
  lib: code tagging module support
  lib: prevent module unloading if memory is not freed
  lib: add allocation tagging support for memory allocation profiling
  lib: introduce support for page allocation tagging
  mm: percpu: increase PERCPU_MODULE_RESERVE to accommodate allocation
    tags
  change alloc_pages name in dma_map_ops to avoid name conflicts
  mm: enable page allocation tagging
  mm: create new codetag references during page splitting
  mm/page_ext: enable early_page_ext when
    CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
  lib: add codetag reference into slabobj_ext
  mm/slab: add allocation accounting into slab allocation and free paths
  mm/slab: enable slab allocation tagging for kmalloc and friends
  mm: percpu: enable per-cpu allocation tagging
  lib: add memory allocations report in show_mem()
  codetag: debug: skip objext checking when it's for objext itself
  codetag: debug: mark codetags for reserved pages as empty
  codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext
    allocations

 Documentation/admin-guide/sysctl/vm.rst       |  16 ++
 Documentation/filesystems/proc.rst            |  28 ++
 MAINTAINERS                                   |  16 ++
 arch/alpha/kernel/pci_iommu.c                 |   2 +-
 arch/mips/jazz/jazzdma.c                      |   2 +-
 arch/powerpc/kernel/dma-iommu.c               |   2 +-
 arch/powerpc/mm/book3s64/radix_pgtable.c      |   2 +-
 arch/powerpc/platforms/ps3/system-bus.c       |   4 +-
 arch/powerpc/platforms/pseries/vio.c          |   2 +-
 arch/x86/kernel/amd_gart_64.c                 |   2 +-
 drivers/block/virtio_blk.c                    |   4 +-
 drivers/gpu/drm/gud/gud_drv.c                 |   2 +-
 drivers/iommu/dma-iommu.c                     |   2 +-
 drivers/mmc/core/block.c                      |   4 +-
 drivers/mtd/spi-nor/debugfs.c                 |   6 +-
 .../ethernet/chelsio/cxgb4/cxgb4_debugfs.c    |   4 +-
 drivers/parisc/ccio-dma.c                     |   2 +-
 drivers/parisc/sba_iommu.c                    |   2 +-
 drivers/scsi/sd.c                             |   8 +-
 drivers/staging/media/atomisp/pci/hmm/hmm.c   |   2 +-
 drivers/xen/grant-dma-ops.c                   |   2 +-
 drivers/xen/swiotlb-xen.c                     |   2 +-
 fs/xfs/kmem.c                                 |   4 +-
 fs/xfs/kmem.h                                 |  10 +-
 include/asm-generic/codetag.lds.h             |  14 +
 include/asm-generic/vmlinux.lds.h             |   3 +
 include/linux/alloc_tag.h                     | 188 +++++++++++++
 include/linux/codetag.h                       |  83 ++++++
 include/linux/dma-map-ops.h                   |   2 +-
 include/linux/fortify-string.h                |   5 +-
 include/linux/fs.h                            |   6 +-
 include/linux/gfp.h                           | 126 +++++----
 include/linux/gfp_types.h                     | 101 +++++--
 include/linux/memcontrol.h                    |  56 +++-
 include/linux/mempool.h                       |  73 +++--
 include/linux/mm.h                            |   8 +
 include/linux/mm_types.h                      |   4 +-
 include/linux/page_ext.h                      |   1 -
 include/linux/pagemap.h                       |   9 +-
 include/linux/percpu.h                        |  27 +-
 include/linux/pgalloc_tag.h                   | 105 +++++++
 include/linux/rhashtable-types.h              |  11 +-
 include/linux/sched.h                         |  24 ++
 include/linux/slab.h                          | 184 +++++++------
 include/linux/string.h                        |   4 +-
 include/linux/string_helpers.h                |  11 +-
 include/linux/vmalloc.h                       |  60 +++-
 init/Kconfig                                  |   4 +
 kernel/dma/mapping.c                          |   4 +-
 kernel/kallsyms_selftest.c                    |   2 +-
 kernel/module/main.c                          |  25 +-
 lib/Kconfig.debug                             |  31 +++
 lib/Makefile                                  |   3 +
 lib/alloc_tag.c                               | 213 +++++++++++++++
 lib/codetag.c                                 | 258 ++++++++++++++++++
 lib/rhashtable.c                              |  52 +++-
 lib/string_helpers.c                          |  22 +-
 lib/test-string_helpers.c                     |   4 +-
 mm/compaction.c                               |   7 +-
 mm/filemap.c                                  |   6 +-
 mm/huge_memory.c                              |   2 +
 mm/hugetlb.c                                  |   8 +-
 mm/kfence/core.c                              |  14 +-
 mm/kfence/kfence.h                            |   4 +-
 mm/memcontrol.c                               |  56 +---
 mm/mempolicy.c                                |  52 ++--
 mm/mempool.c                                  |  36 +--
 mm/mm_init.c                                  |  10 +
 mm/page_alloc.c                               |  66 +++--
 mm/page_ext.c                                 |  13 +
 mm/page_owner.c                               |   2 +-
 mm/percpu-internal.h                          |  26 +-
 mm/percpu.c                                   | 120 ++++----
 mm/show_mem.c                                 |  15 +
 mm/slab.h                                     | 176 ++++++++++--
 mm/slab_common.c                              |  65 ++++-
 mm/slub.c                                     | 138 ++++++----
 mm/util.c                                     |  44 +--
 mm/vmalloc.c                                  |  88 +++---
 scripts/kallsyms.c                            |  13 +
 scripts/module.lds.S                          |   7 +
 81 files changed, 2126 insertions(+), 695 deletions(-)
 create mode 100644 include/asm-generic/codetag.lds.h
 create mode 100644 include/linux/alloc_tag.h
 create mode 100644 include/linux/codetag.h
 create mode 100644 include/linux/pgalloc_tag.h
 create mode 100644 lib/alloc_tag.c
 create mode 100644 lib/codetag.c

-- 
2.43.0.687.g38aa6559b0-goog





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux