+ lib-stackdepot-add-a-refcount-field-in-stack_record.patch added to mm-unstable branch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled
     Subject: lib/stackdepot: add a refcount field in stack_record
has been added to the -mm mm-unstable branch.  Its filename is
     lib-stackdepot-add-a-refcount-field-in-stack_record.patch

This patch will shortly appear at
     https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/lib-stackdepot-add-a-refcount-field-in-stack_record.patch

This patch will later appear in the mm-unstable branch at
    git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days

------------------------------------------------------
From: Oscar Salvador <osalvador@xxxxxxx>
Subject: lib/stackdepot: add a refcount field in stack_record
Date: Tue, 16 May 2023 20:25:35 +0200

Patch series "page_owner: print stacks and their counter", v5.

page_owner is a great debug functionality tool that gets us to know about
all pages that have been allocated/freed and their stacktrace.  This comes
very handy when e.g: debugging leaks, as with some scripting we might be
able to see those stacktraces that are allocating pages but not freeing
theme.

In my experience, that is one of the most useful cases, but it can get
really tedious to screen through all pages aand try to reconstruct the
stack <-> allocated/freed relationship.  There is a lot of noise to cancel
off.

This patch aims to fix that by adding a new functionality into page_owner.
What this does is to create a new read-only file "page_owner_stacks",
which prints only the allocating stacktraces and their counting, being
that the times the stacktrace has allocated - the times it has freed.

So we have a clear overview of stacks <-> allocated/freed relationship
without the need to fiddle with pages and trying to match free stacktraces
with allocated stacktraces.

This is achieved by adding a new refcount_t field in the stack_record
struct, incrementing that refcount_t everytime the same stacktrace
allocates, and decrementing it when it frees a page.  Details can be seen
in the respective patches.

We also create another file called "page_owner_threshold", which let us
specify a threshold, so when when reading from "page_owner_stacks", we
will only see those stacktraces which counting goes beyond the threshold
we specified.

A PoC can be found below:

# cat /sys/kernel/debug/page_owner_threshold
 0
# cat /sys/kernel/debug/page_owner_stacks > stacks_full.txt
# head -32 stacks_full.txt
 prep_new_page+0x10d/0x180
 get_page_from_freelist+0x1bd6/0x1e10
 __alloc_pages+0x194/0x360
 alloc_page_interleave+0x13/0x90
 new_slab+0x31d/0x530
 ___slab_alloc+0x5d7/0x720
 __slab_alloc.isra.85+0x4a/0x90
 kmem_cache_alloc+0x455/0x4a0
 acpi_ps_alloc_op+0x57/0x8f
 acpi_ps_create_scope_op+0x12/0x23
 acpi_ps_execute_method+0x102/0x2c1
 acpi_ns_evaluate+0x343/0x4da
 acpi_evaluate_object+0x1cb/0x392
 acpi_run_osc+0x135/0x260
 acpi_init+0x165/0x4ed
 do_one_initcall+0x3e/0x200
stack count: 2

 free_pcp_prepare+0x287/0x5c0
 free_unref_page+0x1c/0xd0
 __mmdrop+0x50/0x160
 finish_task_switch+0x249/0x2b0
 __schedule+0x2c3/0x960
 schedule+0x44/0xb0
 futex_wait_queue+0x70/0xd0
 futex_wait+0x160/0x250
 do_futex+0x11c/0x1b0
 __x64_sys_futex+0x5e/0x1d0
 do_syscall_64+0x37/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
stack count: 1

 

# echo 10000 > /sys/kernel/debug/page_owner_threshold
# cat /sys/kernel/debug/page_owner_stacks > stacks_10000.txt
# cat stacks_10000.txt 
 prep_new_page+0x10d/0x180
 get_page_from_freelist+0x1bd6/0x1e10
 __alloc_pages+0x194/0x360
 folio_alloc+0x17/0x40
 page_cache_ra_unbounded+0x96/0x170
 filemap_get_pages+0x23d/0x5e0
 filemap_read+0xbf/0x3a0
 __kernel_read+0x136/0x2f0
 kernel_read_file+0x197/0x2d0
 kernel_read_file_from_fd+0x54/0x90
 __do_sys_finit_module+0x89/0x120
 do_syscall_64+0x37/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
stack count: 36195

 prep_new_page+0x10d/0x180
 get_page_from_freelist+0x1bd6/0x1e10
 __alloc_pages+0x194/0x360
 folio_alloc+0x17/0x40
 page_cache_ra_unbounded+0x96/0x170
 filemap_get_pages+0x23d/0x5e0
 filemap_read+0xbf/0x3a0
 new_sync_read+0x106/0x180
 vfs_read+0x16f/0x190
 ksys_read+0xa5/0xe0
 do_syscall_64+0x37/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
stack count: 44484

 prep_new_page+0x10d/0x180
 get_page_from_freelist+0x1bd6/0x1e10
 __alloc_pages+0x194/0x360
 folio_alloc+0x17/0x40
 page_cache_ra_unbounded+0x96/0x170
 filemap_get_pages+0xdd/0x5e0
 filemap_read+0xbf/0x3a0
 new_sync_read+0x106/0x180
 vfs_read+0x16f/0x190
 ksys_read+0xa5/0xe0
 do_syscall_64+0x37/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
stack count: 17874


This patch (of 3):

We want to filter out page_owner output and print only those stacks that
have been repeated beyond a certain threshold.  This gives us the chance
to get rid of a lot of noise.  In order to do that, we need to keep track
of how many repeated stacks (for allocation) do we have, so we add a new
refcount_t field in the stack_record struct.

Note that this might increase the size of the struct for some
architectures.  E.g: x86_64 is not affected due to alignment, but x86
32bits might.

The alternative would be to have some kind of struct like this:

struct track_stacks {
	struct stack_record *stack;
	struct track_stacks *next;
	refcount_t stack_count;

But that would imply to perform more allocations and glue everything
together, which would make the code more complex, so I think that going
with a new field in the struct stack_record is good enough.

Note that on __set_page_owner_handle(), page_owner->handle is set, and on
__reset_page_owner(), page_owner->free_handle is set.

We are interested in page_owner->handle, so when __set_page_owner() gets
called, we derive the stack_record struct from page_owner->handle, and we
increment its refcount_t field; and when __reset_page_owner() gets called,
we derive its stack_record from page_owner->handle() and we decrement its
refcount_t field.

Link: https://lkml.kernel.org/r/20230516182537.3139-1-osalvador@xxxxxxx
Link: https://lkml.kernel.org/r/20230516182537.3139-2-osalvador@xxxxxxx
Signed-off-by: Oscar Salvador <osalvador@xxxxxxx>
Cc: Alexander Potapenko <glider@xxxxxxxxxx>
Cc: Andrey Konovalov <andreyknvl@xxxxxxxxx>
Cc: Eric Dumazet <edumazet@xxxxxxxxxx>
Cc: Marco Elver <elver@xxxxxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxx>
Cc: Suren Baghdasaryan <surenb@xxxxxxxxxx>
Cc: Vlastimil Babka <vbabka@xxxxxxx>
Cc: Waiman Long <longman@xxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 include/linux/stackdepot.h |    2 +
 lib/stackdepot.c           |   53 ++++++++++++++++++++++++++++-------
 mm/page_owner.c            |    6 +++
 3 files changed, 51 insertions(+), 10 deletions(-)

--- a/include/linux/stackdepot.h~lib-stackdepot-add-a-refcount-field-in-stack_record
+++ a/include/linux/stackdepot.h
@@ -94,6 +94,8 @@ static inline int stack_depot_early_init
 depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 					unsigned int nr_entries,
 					gfp_t gfp_flags, bool can_alloc);
+void stack_depot_inc_count(depot_stack_handle_t handle);
+void stack_depot_dec_count(depot_stack_handle_t handle);
 
 /**
  * stack_depot_save - Save a stack trace to stack depot
--- a/lib/stackdepot.c~lib-stackdepot-add-a-refcount-field-in-stack_record
+++ a/lib/stackdepot.c
@@ -60,6 +60,7 @@ struct stack_record {
 	u32 hash;			/* Hash in the hash table */
 	u32 size;			/* Number of stored frames */
 	union handle_parts handle;
+	refcount_t count;		/* Number of the same repeated stacks */
 	unsigned long entries[];	/* Variable-sized array of frames */
 };
 
@@ -305,6 +306,7 @@ depot_alloc_stack(unsigned long *entries
 	stack->handle.offset = pool_offset >> DEPOT_STACK_ALIGN;
 	stack->handle.valid = 1;
 	stack->handle.extra = 0;
+	refcount_set(&stack->count, 1);
 	memcpy(stack->entries, entries, flex_array_size(stack, entries, size));
 	pool_offset += required_size;
 	/*
@@ -457,8 +459,7 @@ depot_stack_handle_t stack_depot_save(un
 }
 EXPORT_SYMBOL_GPL(stack_depot_save);
 
-unsigned int stack_depot_fetch(depot_stack_handle_t handle,
-			       unsigned long **entries)
+static struct stack_record *stack_depot_getstack(depot_stack_handle_t handle)
 {
 	union handle_parts parts = { .handle = handle };
 	/*
@@ -470,6 +471,26 @@ unsigned int stack_depot_fetch(depot_sta
 	size_t offset = parts.offset << DEPOT_STACK_ALIGN;
 	struct stack_record *stack;
 
+	if (!handle)
+		return NULL;
+
+	if (parts.pool_index > pool_index_cached) {
+		WARN(1, "pool index %d out of bounds (%d) for stack id %08x\n",
+			parts.pool_index, pool_index_cached, handle);
+		return NULL;
+	}
+	pool = stack_pools[parts.pool_index];
+	if (!pool)
+		return NULL;
+	stack = pool + offset;
+	return stack;
+}
+
+unsigned int stack_depot_fetch(depot_stack_handle_t handle,
+			       unsigned long **entries)
+{
+	struct stack_record *stack;
+
 	*entries = NULL;
 	/*
 	 * Let KMSAN know *entries is initialized. This shall prevent false
@@ -480,21 +501,33 @@ unsigned int stack_depot_fetch(depot_sta
 	if (!handle)
 		return 0;
 
-	if (parts.pool_index > pool_index_cached) {
-		WARN(1, "pool index %d out of bounds (%d) for stack id %08x\n",
-			parts.pool_index, pool_index_cached, handle);
-		return 0;
-	}
-	pool = stack_pools[parts.pool_index];
-	if (!pool)
+	stack = stack_depot_getstack(handle);
+	if (!stack)
 		return 0;
-	stack = pool + offset;
 
 	*entries = stack->entries;
 	return stack->size;
 }
 EXPORT_SYMBOL_GPL(stack_depot_fetch);
 
+void stack_depot_inc_count(depot_stack_handle_t handle)
+{
+	struct stack_record *stack = NULL;
+
+	stack = stack_depot_getstack(handle);
+	if (stack)
+		refcount_inc(&stack->count);
+}
+
+void stack_depot_dec_count(depot_stack_handle_t handle)
+{
+	struct stack_record *stack = NULL;
+
+	stack = stack_depot_getstack(handle);
+	if (stack)
+		refcount_dec(&stack->count);
+}
+
 void stack_depot_print(depot_stack_handle_t stack)
 {
 	unsigned long *entries;
--- a/mm/page_owner.c~lib-stackdepot-add-a-refcount-field-in-stack_record
+++ a/mm/page_owner.c
@@ -139,6 +139,7 @@ void __reset_page_owner(struct page *pag
 	int i;
 	struct page_ext *page_ext;
 	depot_stack_handle_t handle;
+	depot_stack_handle_t alloc_handle;
 	struct page_owner *page_owner;
 	u64 free_ts_nsec = local_clock();
 
@@ -146,6 +147,9 @@ void __reset_page_owner(struct page *pag
 	if (unlikely(!page_ext))
 		return;
 
+	page_owner = get_page_owner(page_ext);
+	alloc_handle = page_owner->handle;
+
 	handle = save_stack(GFP_NOWAIT | __GFP_NOWARN);
 	for (i = 0; i < (1 << order); i++) {
 		__clear_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags);
@@ -155,6 +159,7 @@ void __reset_page_owner(struct page *pag
 		page_ext = page_ext_next(page_ext);
 	}
 	page_ext_put(page_ext);
+	stack_depot_dec_count(alloc_handle);
 }
 
 static inline void __set_page_owner_handle(struct page_ext *page_ext,
@@ -196,6 +201,7 @@ noinline void __set_page_owner(struct pa
 		return;
 	__set_page_owner_handle(page_ext, handle, order, gfp_mask);
 	page_ext_put(page_ext);
+	stack_depot_inc_count(handle);
 }
 
 void __set_page_owner_migrate_reason(struct page *page, int reason)
_

Patches currently in -mm which might be from osalvador@xxxxxxx are

lib-stackdepot-add-a-refcount-field-in-stack_record.patch
mm-page_owner-add-page_owner_stacks-file-to-print-out-only-stacks-and-their-counte.patch
mmpage_owner-filter-out-stacks-by-a-threshold-counter.patch




[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux