On 11.03.21 19:30, Minchan Kim wrote:
Currently, debugging CMA allocation failures is quite limited.
The most commong source of these failures seems to be page
s/commong/common/
migration which doesn't provide any useful information on the
reason of the failure by itself. alloc_contig_range can report
those failures as it holds a list of migrate-failed pages.
page refcount, mapcount with page flags on dump_page are
helpful information to deduce the culprit. Furthermore,
dump_page_owner was super helpful to find long term pinner
who initiated the page allocation.
Maybe simply "The information logged by dump_page() has already proven
helpful for debugging allocation issues, like identifying long-term
pinnings on ZONE_MOVABLE or MIGRATE_CMA."
The reason it approach with dynamic debug is the debug message
could emit lots of noises as alloc_contig_range calls more
frequently since it's a best effort allocator.
"Let's use the dynamic debugging infrastructure, such that we avoid
flooding the logs and creating a lot of noise on frequent
alloc_contig_range() calls. This information is helpful for debugging only."
>>>
There are two ifdefery conditions to support common dyndbg options:
- CONFIG_DYNAMIC_DEBUG_CORE && DYNAMIC_DEBUG_MODULE
It aims for supporting the feature with only specific file
with adding ccflags.
- CONFIG_DYNAMIC_DEBUG
It aims for supporting the feature with system wide globally.
A simple example to enable the feature:
Admin could enable the dump like this(by default, disabled)
echo "func alloc_contig_dump_pages +p" > control
Admin could disable it.
echo "func alloc_contig_dump_pages =_" > control
Detail goes Documentation/admin-guide/dynamic-debug-howto.rst
<<< I'd drop that completely and only mention:
"For details on dynamic debugging, see
Documentation/admin-guide/dynamic-debug-howto.rst."
As you have usage in the code itself, I think you don't have to be
repetitive. The ifdeffery seems to be common (e.g.,
include/linux/netdevice.) for dynamic debugging users, so I don't see
the need to describe that in detail.
>>>
A concern is utility functions in dump_page uses inconsistent
loglevels.
__dump_page: KERN_WARNING
__dump_page_owner: KERN_ALERT
stack_trace_print: KERN_DEFAULT
There are bunch of places to use the inconsistent loglevel
utility functions(e.g., just grep dump_page/strace_trace_print).
It's unfortunate but here we are. It could be addressed
different patchset.
<<< I'd drop that completely and mention
"In the future, we might want to make the loglevels used inside
dump_page() consistent and eventually rework the way we log the
information here. See [1]"
Where [1] is a link to the discussion.
Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx>
---
* from v3 - https://lore.kernel.org/linux-mm/20210310180104.517886-1-minchan@xxxxxxxxxx
* add dyndgb usage comment - akpm
* use dumpstack instead of warn_on - david
* from v2 - https://lore.kernel.org/linux-mm/20210308202047.1903802-1-minchan@xxxxxxxxxx/
* remove ratelimit - mhocko
* from v1 - https://lore.kernel.org/linux-mm/20210217163603.429062-1-minchan@xxxxxxxxxx/
* use dynamic debugging with system wide instead of per-call site - mhocko
mm/page_alloc.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
Minor nits:
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3e4b29ee2b1e..76fc202cb105 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8453,6 +8453,36 @@ static unsigned long pfn_max_align_up(unsigned long pfn)
pageblock_nr_pages));
}
+#if defined(CONFIG_DYNAMIC_DEBUG) || \
+ (defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
+/*
+ * usage)
"usage)" looks wrong here. Did you mean "Usage:"
+ * dyndbg_dir="/sys/kernel/debug/dynamic_debug"
+ * To enable:
+ * echo "func alloc_contig_dump_pages +p" > $dyndbg_dir/control
+ * To disable:
+ * echo "func alloc_contig_dump_pages =_" > $dyndbg_dir/control
+ * For detail, read dynamic-debug-howto.rst
Maybe simply
"See admin-guide/dynamic-debug-howto.rst"
+ */
+static void alloc_contig_dump_pages(struct list_head *page_list)
+{
+ DEFINE_DYNAMIC_DEBUG_METADATA(descriptor,
+ "migrate failure");
You can fit that into a single line.
+
+ if (DYNAMIC_DEBUG_BRANCH(descriptor)) {
+ struct page *page;
+
+ dump_stack();
+ list_for_each_entry(page, page_list, lru)
+ dump_page(page, "migration failure");
+ }
+}
+#else
+static inline void alloc_contig_dump_pages(struct list_head *page_list)
+{
+}
+#endif
+
/* [start, end) must belong to a single zone. */
static int __alloc_contig_migrate_range(struct compact_control *cc,
unsigned long start, unsigned long end)
@@ -8496,6 +8526,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
NULL, (unsigned long)&mtc, cc->mode, MR_CONTIG_RANGE);
}
if (ret < 0) {
+ alloc_contig_dump_pages(&cc->migratepages);
putback_movable_pages(&cc->migratepages);
return ret;
}
As I said, for my taste good enough for now. I would certainly preferred
what Michal suggested (e.g., doing it via debug loglevels), but this
gets the job done and is not too ugly.
Reviewed-by: David Hildenbrand <david@xxxxxxxxxx>
--
Thanks,
David / dhildenb