Re: [PATCH v2] mm: page_alloc: dump migrate-failed pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 09, 2021 at 10:32:51AM +0100, Michal Hocko wrote:
> On Mon 08-03-21 12:20:47, Minchan Kim wrote:
> > alloc_contig_range is usually used on cma area or movable zone.
> > It's critical if the page migration fails on those areas so
> > dump more debugging message.
> 
> I disagree with this statement. alloc_contig_range is not a reliable
> allocator. Any user, be it CMA or direct users of alloc_contig_range
> have to deal with allocation failures. Debugging information can be
> still useful but considering migration failures critical is
> overstatement to say the least.

Fair enough. Let's change it.

"Currently, debugging CMA allocation failure is too hard
due to lacking of page information. alloc_contig_range is
proper place to dump them since it has migrate-failed page
list."


> 
> > page refcount, mapcount with page flags on dump_page are
> > helpful information to deduce the culprit. Furthermore,
> > dump_page_owner was super helpful to find long term pinner
> > who initiated the page allocation.
> > 
> > Admin could enable the dump like this(by default, disabled)
> > 
> > 	echo "func dump_migrate_failure_pages +p" > control
> > 
> > Admin could disable it.
> > 
> > 	echo "func dump_migrate_failure_pages =_" > control
> 
> My original idea was to add few pr_debug and -DDYNAMIC_DEBUG_MODULE for
> page_alloc.c. It makes sense to enable a whole bunch at once though.
> The naming should better reflect this is alloc_contig_rage related
> because the above sounds like a generic migration failure thing.

alloc_contig_dump_pages?

> 
> Somebody more familiar with the dynamic debugging infrastructure needs
> to have a look but from from a quick look it seems ok.
> 
> Do we really need all the ugly ifdefery, though? Don't we want to have
> this compiled in all the time and just rely on the static branch managed
> by the dynamic debugging framework?

I have no further idea to make it simple while we keep the flexibility
for arguments and print format.

#if defined(CONFIG_DYNAMIC_DEBUG) || \
        (defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
static void alloc_contig_dump_pages(struct list_head *page_list)
{
        static DEFINE_RATELIMIT_STATE(_rs,
                                        DEFAULT_RATELIMIT_INTERVAL,
                                        DEFAULT_RATELIMIT_BURST);

        DEFINE_DYNAMIC_DEBUG_METADATA(descriptor,
                        "migrate failure");
        if (DYNAMIC_DEBUG_BRANCH(descriptor) && __ratelimit(&_rs)) {
                struct page *page;

                WARN(1, "failed callstack");
                list_for_each_entry(page, page_list, lru)
                        dump_page(page, "migration failure");
        }
}
#else
static inline void alloc_contig_dump_pages(struct list_head *page_list)
{
}
#endif

>  
> [...]
> > +void dump_migrate_failure_pages(struct list_head *page_list)
> > +{
> > +	DEFINE_DYNAMIC_DEBUG_METADATA(descriptor,
> > +			"migrate failure");
> > +	if (DYNAMIC_DEBUG_BRANCH(descriptor) &&
> > +			alloc_contig_ratelimit()) {
> > +		struct page *page;
> > +
> > +		WARN(1, "failed callstack");
> > +		list_for_each_entry(page, page_list, lru)
> > +			dump_page(page, "migration failure");
> > +	}
> 
> Apart from the above, do we have to warn for something that is a
> debugging aid? A similar concern wrt dump_page which uses pr_warn and

Make sense.

> page owner is using even pr_alert.
> Would it make sense to add a loglevel parameter both into __dump_page
> and dump_page_owner?

Let me try it.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux