On Tue, Sep 3, 2024 at 4:57 PM Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > > On Tue, Sep 03, 2024 at 10:50:09AM +1200, Barry Song wrote: > > From: Barry Song <v-songbaohua@xxxxxxxx> > > > > The mmap_write_lock() can block all access to the VMAs, for example page > > faults. Performing memory allocation while holding this lock may trigger > > direct reclamation, leading to others being queued in the rwsem for an > > extended period. > > We've observed that the allocation can sometimes take more than 300ms, > > significantly blocking other threads. The user interface sometimes > > becomes less responsive as a result. To prevent this, let's move the > > allocation outside of the write lock. > > A potential side effect could be an extra alloc_page() for the second > > thread executing binder_install_single_page() while the first thread > > has done it earlier. However, according to Tangquan's 48-hour profiling > > using monkey, the likelihood of this occurring is minimal, with a ratio > > of only 1 in 2400. Compared to the significantly costly rwsem, this is > > negligible. > > On the other hand, holding a write lock without making any VMA > > modifications appears questionable and likely incorrect. While this > > patch focuses on reducing the lock duration, future updates may aim > > to eliminate the write lock entirely. > > > > Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> > > Cc: "Arve Hjønnevåg" <arve@xxxxxxxxxxx> > > Cc: Todd Kjos <tkjos@xxxxxxxxxxx> > > Cc: Martijn Coenen <maco@xxxxxxxxxxx> > > Cc: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> > > Cc: Christian Brauner <brauner@xxxxxxxxxx> > > Cc: Carlos Llamas <cmllamas@xxxxxxxxxx> > > Cc: Suren Baghdasaryan <surenb@xxxxxxxxxx> > > Tested-by: Tangquan Zheng <zhengtangquan@xxxxxxxx> > > Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx> > > --- > > drivers/android/binder_alloc.c | 18 ++++++++++++++---- > > 1 file changed, 14 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c > > index b3acbc4174fb..f20074e23a7c 100644 > > --- a/drivers/android/binder_alloc.c > > +++ b/drivers/android/binder_alloc.c > > @@ -227,13 +227,23 @@ static int binder_install_single_page(struct binder_alloc *alloc, > > if (!mmget_not_zero(alloc->mm)) > > return -ESRCH; > > > > + /* > > + * Don't allocate page in mmap_write_lock, this can block > > + * mmap_rwsem for a long time; Meanwhile, allocation failure > > + * doesn't necessarily need to return -ENOMEM, if lru_page > > + * has been installed, we can still return 0(success). > > + */ > > + page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); > > But now you are allocating new pages even if binder_get_installed_page() > is an error, right? Doesn't that slow things down? very very unlikely, as the ratio is only 1/2400 while write lock 100% blocks everyone. > > How was this benchmarked? > i actually put Tangquan's profiling result: " However, according to Tangquan's 48-hour profiling using monkey, the likelihood of this occurring is minimal, with a ratio of only 1 in 2400. Compared to the significantly costly rwsem, this is negligible." > > + > > /* > > * Protected with mmap_sem in write mode as multiple tasks > > * might race to install the same page. > > */ > > mmap_write_lock(alloc->mm); > > - if (binder_get_installed_page(lru_page)) > > + if (binder_get_installed_page(lru_page)) { > > + ret = 1; > > That is not a valid error value :( > > > goto out; > > + } > > > > if (!alloc->vma) { > > pr_err("%d: %s failed, no vma\n", alloc->pid, __func__); > > @@ -241,7 +251,6 @@ static int binder_install_single_page(struct binder_alloc *alloc, > > goto out; > > } > > > > - page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); > > if (!page) { > > pr_err("%d: failed to allocate page\n", alloc->pid); > > ret = -ENOMEM; > > @@ -252,7 +261,6 @@ static int binder_install_single_page(struct binder_alloc *alloc, > > if (ret) { > > pr_err("%d: %s failed to insert page at offset %lx with %d\n", > > alloc->pid, __func__, addr - alloc->buffer, ret); > > - __free_page(page); > > ret = -ENOMEM; > > goto out; > > } > > @@ -262,7 +270,9 @@ static int binder_install_single_page(struct binder_alloc *alloc, > > out: > > mmap_write_unlock(alloc->mm); > > mmput_async(alloc->mm); > > - return ret; > > + if (ret && page) > > + __free_page(page); > > + return ret < 0 ? ret : 0; > > Please only use ? : for when you have to, otherwise please spell it out > with a normal if statement: > if (ret < 0) > return ret; > return 0; > > But, this is abusing the fact that you set "ret = 1" above, which is > going to trip someone up in the future as that is NOT a normal coding > pattern we have in the kernel, sorry. > > If you insist on this change, please rework it to not have that type of > "positive means one thing, 0 means another, and negative means yet > something else" please. I was trying to consolidate all free_page() calls into one place. Otherwise, we would need multiple free_page() calls. I'm perfectly fine with having more free_page() calls in both the ret = 1 and ret < 0 paths. In that case, the ret = 1 path can be removed if you prefer. > > thanks, > > greg k-h Thanks Barry