On Tue, Jan 17, 2023 at 7:57 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > > On Mon 09-01-23 12:53:34, Suren Baghdasaryan wrote: > > call_rcu() can take a long time when callback offloading is enabled. > > Its use in the vm_area_free can cause regressions in the exit path when > > multiple VMAs are being freed. > > What kind of regressions. > > > To minimize that impact, place VMAs into > > a list and free them in groups using one call_rcu() call per group. > > Please add some data to justify this additional complexity. Sorry, should have done that in the first place. A 4.3% regression was noticed when running execl test from unixbench suite. spawn test also showed 1.6% regression. Profiling revealed that vma freeing was taking longer due to call_rcu() which is slow when RCU callback offloading is enabled. I asked Paul McKenney and he explained to me that because the callbacks are offloaded to some other kthread, possibly running on some other CPU, it is necessary to use explicit locking. Locking on a per-call_rcu() basis would result in excessive contention during callback flooding. So, by batching call_rcu() work we cut that overhead and reduce this lock contention. > -- > Michal Hocko > SUSE Labs