Re: [PATCH v4] mm: page_alloc: dump migrate-failed pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11.03.21 19:30, Minchan Kim wrote:
Currently, debugging CMA allocation failures is quite limited.
The most commong source of these failures seems to be page

s/commong/common/

migration which doesn't provide any useful information on the
reason of the failure by itself. alloc_contig_range can report
those failures as it holds a list of migrate-failed pages.

page refcount, mapcount with page flags on dump_page are
helpful information to deduce the culprit. Furthermore,
dump_page_owner was super helpful to find long term pinner
who initiated the page allocation.

Maybe simply "The information logged by dump_page() has already proven helpful for debugging allocation issues, like identifying long-term pinnings on ZONE_MOVABLE or MIGRATE_CMA."


The reason it approach with dynamic debug is the debug message
could emit lots of noises as alloc_contig_range calls more
frequently since it's a best effort allocator.

"Let's use the dynamic debugging infrastructure, such that we avoid
flooding the logs and creating a lot of noise on frequent alloc_contig_range() calls. This information is helpful for debugging only."

>>>


There are two ifdefery conditions to support common dyndbg options:

- CONFIG_DYNAMIC_DEBUG_CORE && DYNAMIC_DEBUG_MODULE
It aims for supporting the feature with only specific file
with adding ccflags.

- CONFIG_DYNAMIC_DEBUG
It aims for supporting the feature with system wide globally.

A simple example to enable the feature:

Admin could enable the dump like this(by default, disabled)

	echo "func alloc_contig_dump_pages +p" > control

Admin could disable it.

	echo "func alloc_contig_dump_pages =_" > control

Detail goes Documentation/admin-guide/dynamic-debug-howto.rst

<<< I'd drop that completely and only mention:

"For details on dynamic debugging, see Documentation/admin-guide/dynamic-debug-howto.rst."

As you have usage in the code itself, I think you don't have to be repetitive. The ifdeffery seems to be common (e.g., include/linux/netdevice.) for dynamic debugging users, so I don't see the need to describe that in detail.

>>>


A concern is utility functions in dump_page uses inconsistent
loglevels.

__dump_page: KERN_WARNING
__dump_page_owner: KERN_ALERT
         stack_trace_print: KERN_DEFAULT

There are bunch of places to use the inconsistent loglevel
utility functions(e.g., just grep dump_page/strace_trace_print).
It's unfortunate but here we are. It could be addressed
different patchset.

<<< I'd drop that completely and mention

"In the future, we might want to make the loglevels used inside dump_page() consistent and eventually rework the way we log the information here. See [1]"

Where [1] is a link to the discussion.


Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx>
---
* from v3 - https://lore.kernel.org/linux-mm/20210310180104.517886-1-minchan@xxxxxxxxxx
   * add dyndgb usage comment - akpm
   * use dumpstack instead of warn_on - david

* from v2 - https://lore.kernel.org/linux-mm/20210308202047.1903802-1-minchan@xxxxxxxxxx/
   * remove ratelimit - mhocko

* from v1 - https://lore.kernel.org/linux-mm/20210217163603.429062-1-minchan@xxxxxxxxxx/
   * use dynamic debugging with system wide instead of per-call site - mhocko

  mm/page_alloc.c | 31 +++++++++++++++++++++++++++++++
  1 file changed, 31 insertions(+)


Minor nits:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3e4b29ee2b1e..76fc202cb105 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8453,6 +8453,36 @@ static unsigned long pfn_max_align_up(unsigned long pfn)
  				pageblock_nr_pages));
  }
+#if defined(CONFIG_DYNAMIC_DEBUG) || \
+	(defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
+/*
+ * usage)

"usage)" looks wrong here. Did you mean "Usage:"

+ * 	dyndbg_dir="/sys/kernel/debug/dynamic_debug"
+ * To enable:
+ * 	echo "func alloc_contig_dump_pages +p" > $dyndbg_dir/control
+ * To disable:
+ * 	echo "func alloc_contig_dump_pages =_" > $dyndbg_dir/control
+ * For detail, read dynamic-debug-howto.rst

Maybe simply

"See admin-guide/dynamic-debug-howto.rst"

+ */
+static void alloc_contig_dump_pages(struct list_head *page_list)
+{
+	DEFINE_DYNAMIC_DEBUG_METADATA(descriptor,
+			"migrate failure");

You can fit that into a single line.

+
+	if (DYNAMIC_DEBUG_BRANCH(descriptor)) {
+		struct page *page;
+
+		dump_stack();
+		list_for_each_entry(page, page_list, lru)
+			dump_page(page, "migration failure");
+	}
+}
+#else
+static inline void alloc_contig_dump_pages(struct list_head *page_list)
+{
+}
+#endif
+
  /* [start, end) must belong to a single zone. */
  static int __alloc_contig_migrate_range(struct compact_control *cc,
  					unsigned long start, unsigned long end)
@@ -8496,6 +8526,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
  				NULL, (unsigned long)&mtc, cc->mode, MR_CONTIG_RANGE);
  	}
  	if (ret < 0) {
+		alloc_contig_dump_pages(&cc->migratepages);
  		putback_movable_pages(&cc->migratepages);
  		return ret;
  	}


As I said, for my taste good enough for now. I would certainly preferred what Michal suggested (e.g., doing it via debug loglevels), but this gets the job done and is not too ugly.

Reviewed-by: David Hildenbrand <david@xxxxxxxxxx>

--
Thanks,

David / dhildenb






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux