Re: [PATCH v3 1/2] mm: cma: fix allocation may fail sometimes

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Tue, 15 Mar 2022 15:58:37 -0700

On Tue, 15 Mar 2022 22:45:20 +0800 Dong Aisheng <aisheng.dong@xxxxxxx> wrote:

> --- a/mm/cma.c
> +++ b/mm/cma.c
>
> ...
>
> @@ -457,6 +458,16 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
>  				offset);
>  		if (bitmap_no >= bitmap_maxno) {
>  			spin_unlock_irq(&cma->lock);
> +			pr_debug("%s(): alloc fail, retry loop %d\n", __func__, loop++);
> +			/*
> +			 * rescan as others may finish the memory migration
> +			 * and quit if no available CMA memory found finally
> +			 */
> +			if (start) {
> +				schedule();
> +				start = 0;
> +				continue;
> +			}
>  			break;

The schedule() is problematic.  For a start, we'd normally use
cond_resched() here, so we avoid calling the more expensive schedule()
if we know it won't perform any action.

But cond_resched() is problematic if this thread has realtime
scheduling policy and the process we're waiting on does not.  One way
to address that is to use an unconditional msleep(1), but that's still
just a hack.

A much cleaner solution is to use appropriate locking so that various
threads run this code in order, without messing each other up.

And it looks like the way to do that is to simply revert the commit
which caused this regression, a4efc174b382 ("mm/cma.c: remove redundant
cma_mutex lock")?