Re: [PATCH rfc 0/2] mm: cma: make cma_release() non-blocking

Roman Gushchin <guro@xxxxxx> · Wed, 21 Oct 2020 19:45:26 -0700

On Thu, Oct 22, 2020 at 09:54:53AM +0800, Xiaqing (A) wrote:
> 
> 
> On 2020/10/17 6:52, Roman Gushchin wrote:
> 
> > This small patchset makes cma_release() non-blocking and simplifies
> > the code in hugetlbfs, where previously we had to temporarily drop
> > hugetlb_lock around the cma_release() call.
> > 
> > It should help Zi Yan on his work on 1 GB THPs: splitting a gigantic
> > THP under a memory pressure requires a cma_release() call. If it's
> > a blocking function, it complicates the already complicated code.
> > Because there are at least two use cases like this (hugetlbfs is
> > another example), I believe it's just better to make cma_release()
> > non-blocking.
> > 
> > It also makes it more consistent with other memory releasing functions
> > in the kernel: most of them are non-blocking.
> > 
> > 
> > Roman Gushchin (2):
> >    mm: cma: make cma_release() non-blocking
> >    mm: hugetlb: don't drop hugetlb_lock around cma_release() call
> > 
> >   mm/cma.c     | 51 +++++++++++++++++++++++++++++++++++++++++++++++++--
> >   mm/hugetlb.c |  6 ------
> >   2 files changed, 49 insertions(+), 8 deletions(-)
> > 
> I don't think this patch is a good idea.It transfers part or even all of the time of
> cma_release to cma_alloc, which is more concerned by performance indicators.

I'm not quite sure: if cma_alloc() is racing with cma_release(), cma_alloc() will
wait for the cma_lock mutex anyway. So we don't really transfer anything to cma_alloc().

> On Android phones, CPU resource competition is intense in many scenarios,
> As a result, kernel threads and workers can be scheduled only after some ticks or more.
> In this case, the performance of cma_alloc will deteriorate significantly,
> which is not good news for many services on Android.

Ok, I agree, if the cpu is heavily loaded, it might affect the total execution time.

If we aren't going into the mutex->spinlock conversion direction (as Mike suggested),
we can address the performance concerns by introducing a cma_release_nowait() function,
so that the default cma_release() would work in the old way.
cma_release_nowait() can set an atomic flag on a cma area, which will cause following
cma_alloc()'s to flush the release queue. In this case there will be no performance
penalty unless somebody is using cma_release_nowait().
Will it work for you?

Thank you!