Re: [PATCH v3 1/2] mm: cma: fix allocation may fail sometimes

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Wed, 16 Mar 2022 14:09:04 -0700

On Wed, 16 Mar 2022 11:41:37 +0800 Dong Aisheng <dongas86@xxxxxxxxx> wrote:

> On Wed, Mar 16, 2022 at 6:58 AM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Tue, 15 Mar 2022 22:45:20 +0800 Dong Aisheng <aisheng.dong@xxxxxxx> wrote:
> >
> > > --- a/mm/cma.c
> > > +++ b/mm/cma.c
> > >
> > > ...
> > >
> > > @@ -457,6 +458,16 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
> > >                               offset);
> > >               if (bitmap_no >= bitmap_maxno) {
> > >                       spin_unlock_irq(&cma->lock);
> > > +                     pr_debug("%s(): alloc fail, retry loop %d\n", __func__, loop++);
> > > +                     /*
> > > +                      * rescan as others may finish the memory migration
> > > +                      * and quit if no available CMA memory found finally
> > > +                      */
> > > +                     if (start) {
> > > +                             schedule();
> > > +                             start = 0;
> > > +                             continue;
> > > +                     }
> > >                       break;
> >
> > The schedule() is problematic. For a start, we'd normally use
> > cond_resched() here, so we avoid calling the more expensive schedule()
> > if we know it won't perform any action.
> >
> > But cond_resched() is problematic if this thread has realtime
> > scheduling policy and the process we're waiting on does not.  One way
> > to address that is to use an unconditional msleep(1), but that's still
> > just a hack.
> >
> 
> I think we can simply drop schedule() here during the second round of retry
> as the estimated delay may not be really needed.

That will simply cause a tight loop, so I'm obviously not understanding
the proposal.