On Wed, Mar 16, 2022 at 6:58 AM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Tue, 15 Mar 2022 22:45:20 +0800 Dong Aisheng <aisheng.dong@xxxxxxx> wrote: > > > --- a/mm/cma.c > > +++ b/mm/cma.c > > > > ... > > > > @@ -457,6 +458,16 @@ struct page *cma_alloc(struct cma *cma, unsigned long count, > > offset); > > if (bitmap_no >= bitmap_maxno) { > > spin_unlock_irq(&cma->lock); > > + pr_debug("%s(): alloc fail, retry loop %d\n", __func__, loop++); > > + /* > > + * rescan as others may finish the memory migration > > + * and quit if no available CMA memory found finally > > + */ > > + if (start) { > > + schedule(); > > + start = 0; > > + continue; > > + } > > break; > > The schedule() is problematic. For a start, we'd normally use > cond_resched() here, so we avoid calling the more expensive schedule() > if we know it won't perform any action. > > But cond_resched() is problematic if this thread has realtime > scheduling policy and the process we're waiting on does not. One way > to address that is to use an unconditional msleep(1), but that's still > just a hack. > I think we can simply drop schedule() here during the second round of retry as the estimated delay may not be really needed. Do you think that's ok? > A much cleaner solution is to use appropriate locking so that various > threads run this code in order, without messing each other up. > > And it looks like the way to do that is to simply revert the commit > which caused this regression, a4efc174b382 ("mm/cma.c: remove redundant > cma_mutex lock")? Yes, agree it could be a backup solution if not better ideas. Regards Aisheng