On Thu, Mar 04, 2021 at 06:23:09PM +0100, David Hildenbrand wrote: > > > You want to debug something, so you try triggering it and capturing debug > > > data. There are not that many alloc_contig_range() users such that this > > > would really be an issue to isolate ... > > > > cma_alloc uses alloc_contig_range and cma_alloc has lots of users. > > Even, it is expoerted by dmabuf so any userspace would trigger the > > allocation by their own. Some of them could be tolerant for the failure, > > rest of them could be critical. We should't expect it by limited kernel > > usecase. > > Assume you are debugging allocation failures. You either collect the data > yourself or ask someone to send you that output. You care about any > alloc_contig_range() allocation failures that shouldn't happen, don't you? > > > > > > > > > Strictly speaking: any allocation failure on ZONE_MOVABLE or CMA is > > > problematic (putting aside NORETRY logic and similar aside). So any such > > > page you hit is worth investigating and, therefore, worth getting logged for > > > debugging purposes. > > > > If you believe the every alloc_contig_range failure is problematic > > Every one where we should have guarantees I guess: ZONE_MOVABLE or > MIGRAT_CMA. On ZONE_NORMAL, there are no guarantees. Indeed. > > > and there is no such realy example I menionted above in the world, > > I am happy to put this chunk to support dynamic debugging. > > Okay? > > > > +#if defined(CONFIG_DYNAMIC_DEBUG) || \ > > + (defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE)) > > +static DEFINE_RATELIMIT_STATE(alloc_contig_ratelimit_state, > > + DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST); > > +int alloc_contig_ratelimit(void) > > +{ > > + return __ratelimit(&alloc_contig_ratelimit_state); > > +} > > + > > ^ do we need ratelimiting with dynamic debugging enabled? Main argument was debug message flooding. Even though we play with dynamic debugging, the issue never disappear. > > > +void dump_migrate_failure_pages(struct list_head *page_list) > > +{ > > + DEFINE_DYNAMIC_DEBUG_METADATA(descriptor, > > + "migrate failure"); > > + if (DYNAMIC_DEBUG_BRANCH(descriptor) && > > + alloc_contig_ratelimit()) { > > + struct page *page; > > + > > + WARN(1, "failed callstack"); > > + list_for_each_entry(page, page_list, lru) > > + dump_page(page, "migration failure"); > > Are all pages on the list guaranteed to be problematic, or only the first > entry? I assume all. All. > > > + } > > +} > > +#else > > +static inline void dump_migrate_failure_pages(struct list_head *page_list) > > +{ > > +} > > +#endif > > + > > /* [start, end) must belong to a single zone. */ > > static int __alloc_contig_migrate_range(struct compact_control *cc, > > unsigned long start, unsigned long end) > > @@ -8496,6 +8522,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc, > > NULL, (unsigned long)&mtc, cc->mode, MR_CONTIG_RANGE); > > } > > if (ret < 0) { > > + dump_migrate_failure_pages(&cc->migratepages); > > putback_movable_pages(&cc->migratepages); > > return ret; > > } > > > > > > If that's the way dynamic debugging is configured/enabled (still have to > look into it) - yes, that goes into the right direction. As I said above, > you should dump only where we have some kind of guarantees I assume. Sure, let me wait for your review before sending next revision. Thanks for the review!