On Tue, 15 Mar 2022 22:45:20 +0800 Dong Aisheng <aisheng.dong@xxxxxxx> wrote: > --- a/mm/cma.c > +++ b/mm/cma.c > > ... > > @@ -457,6 +458,16 @@ struct page *cma_alloc(struct cma *cma, unsigned long count, > offset); > if (bitmap_no >= bitmap_maxno) { > spin_unlock_irq(&cma->lock); > + pr_debug("%s(): alloc fail, retry loop %d\n", __func__, loop++); > + /* > + * rescan as others may finish the memory migration > + * and quit if no available CMA memory found finally > + */ > + if (start) { > + schedule(); > + start = 0; > + continue; > + } > break; The schedule() is problematic. For a start, we'd normally use cond_resched() here, so we avoid calling the more expensive schedule() if we know it won't perform any action. But cond_resched() is problematic if this thread has realtime scheduling policy and the process we're waiting on does not. One way to address that is to use an unconditional msleep(1), but that's still just a hack. A much cleaner solution is to use appropriate locking so that various threads run this code in order, without messing each other up. And it looks like the way to do that is to simply revert the commit which caused this regression, a4efc174b382 ("mm/cma.c: remove redundant cma_mutex lock")?